Extract model summaries and store them as a new co

2020-07-10 07:23发布

问题:

I'm new to the purrr paradigm and am struggling with it.

Following a few sources I have managed to get so far as to nest a data frame, run a linear model on the nested data, extract some coefficients from each lm, and generate a summary for each lm. The last thing I want to do is extract the "r.squared" from the summary (which I would have thought would be the simplest part of what I'm trying to achieve), but for whatever reason I can't get the syntax right.

Here's a MWE of what I have that works:

library(purrr)
library(dplyr)
library(tidyr)

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         sum = map(fit, ~summary))

and here's my attempt to extract the r.squared which fails:

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         sum = map(fit, ~summary),
         rsq = map_dbl(sum, "r.squared"))
Error in eval(substitute(expr), envir, enclos) : 
  `x` must be a vector (not a closure)

This is superficially similar to the example given on the RStudio site:

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared")

This works however I would like the r.squared values to sit in a new column (hence the mutate statement) and I'd like to understand why my code isn't working instead of working-around the problem.

EDIT:

Here's a working solution that I came to using the solutions below:

mtcars %>%
      nest(-cyl) %>% 
      mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
             summary = map(fit, glance),
             r_sq = map_dbl(summary, "r.squared"))

EDIT 2:

So, it actually turns out that the bug is from the inclusion of the tilde key in the summary = map(fit, ~summary) line. My guess is that the makes the object a function which is nest and not the object returned by the summary itself. Would love an authoritative answer on this if someone wants to chime in.

To be clear, this version of the original code works fine:

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(fit, summary),
         r_sq = map_dbl(summary, "r.squared"))

回答1:

To fit in your current pipe, you'd want to use unnest along with map and glance from the broom package.

library(tidyr)
library(dplyr)
library(broom)

mtcars %>%
  nest(-cyl) %>%
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .))) %>% 
  unnest(map(fit, glance))

You'll get more than just the r-squared, and from there you can use select to drop what you don't need.

If you want to keep the model summaries nested in list-columns:

mtcars %>%
  nest(-cyl) %>% 
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(fit, glance)) 

If you want to just extract a single value from a nested frame you just need to use map to the actual value (and not [[ or extract2 as I originally suggested, many thanks for finding that out).

mtcars %>%
  nest(-cyl) %>% 
  mutate(fit = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(fit, glance),
         r_sq = map_dbl(summary, "r.squared"))


回答2:

I think for what you'd like to achieve, you are better off using the glance() function from the broom package:

library(broom)
library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  do(glance(lm(mpg ~ wt, data = .))) %>%
  select(cyl, r.squared)
#    cyl r.squared
#  <dbl>     <dbl>
#1     4 0.5086326
#2     6 0.4645102
#3     8 0.4229655


回答3:

There must be a better way, here is my try with pipes:

mtcars %>%
  split(.$cyl) %>%
  map(~ lm(mpg ~ wt, data = .x)) %>%
  map(summary) %>%
  map_dbl("r.squared") %>% 
  list() %>% 
  as.data.frame(col.names = "r.squared") %>% 
  add_rownames(var = "cyl")

# # A tibble: 3 × 2
#     cyl r.squared
#   <chr>     <dbl>
# 1     4 0.5086326
# 2     6 0.4645102
# 3     8 0.4229655

Note: You might get below a warning.

Warning message: Deprecated, use tibble::rownames_to_column() instead.