Use lapply for multiple regression with formula ch

2020-03-26 08:23发布

问题:

I have seen an example of list apply (lapply) that works nicely to take a list of data objects, and return a list of regression output, which we can pass to Stargazer for nicely formatted output. Using stargazer with a list of lm objects created by lapply-ing over a split data.frame

library(MASS)
library(stargazer)
data(Boston)

by.river <- split(Boston, Boston$chas)
class(by.river)

fit <- lapply(by.river, function(dd)lm(crim ~ indus,data=dd))
stargazer(fit, type = "text")

What i would like to do is, instead of passing a list of datasets to do the same regression on each data set (as above), pass a list of independent variables to do different regressions on the same data set. In long hand it would look like this:

fit2 <- vector(mode = "list", length = 2)
fit2[[1]] <- lm(nox ~ indus, data = Boston)
fit2[[2]] <- lm(crim ~ indus, data = Boston)
stargazer(fit2, type = "text")

with lapply, i tried this and it doesn't work. Where did I go wrong?

myvarc <- c("nox","crim")
class(myvarc)
myvars <- as.list(myvarc)
class(myvars)
fit <- lapply(myvars, function(dvar)lm(dvar ~ indus,data=Boston))
stargazer(fit, type = "text")

回答1:

This should work:

fit <- lapply(myvars, function(dvar) lm(eval(paste0(dvar,' ~ wt')), data = Boston))


回答2:

Consider creating dynamic formulas from string:

fit <- lapply(myvars, function(dvar)
    lm(as.formula(paste0(dvar, " ~ indus")),data=Boston))


回答3:

You can also use a dplyr & purrr approach, keep everything in a tibble, pull out what you want, when you need it. No difference in functionality from the lapply methods.

library(dplyr)
library(purrr)
library(MASS)
library(stargazer)

var_tibble <- tibble(vars = c("nox","crim"), data = list(Boston)) 

 analysis <- var_tibble %>% 
  mutate(models = map2(data, vars, ~lm(as.formula(paste0(.y, " ~ indus")), data = .x))) %>% 
  mutate(tables = map2(models, vars, ~stargazer(.x, type = "text", dep.var.labels.include = FALSE, column.labels = .y)))


回答4:

You can also use get():

# make a list of independent variables
  list_x <- list("nox","crim")

# create regression function
  my_reg <- function(x) { lm(indus ~ get(x), data = Boston) }

# run regression
  results <- lapply(list_x, my_reg)