I'm looking for a Tidyverse / broom solution that can solve this puzzle:
Let's say I have different DVs and a specific set of IVS and I want to perform a regression that considers every DV and this specific set of IVs.
I know I can use something like for i in or apply family, but I really want to run that using tidyverse.
The following code works as an example
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
happiness = rnorm(100, mean = 6, sd=1),
health = rnorm(100, mean=20, sd = 3),
sex = c(0,1),
faculty = c(0,1,2,3))
mod1 <- lm(income ~ sex + faculty, ds)
mod2 <- lm(happiness ~ sex + faculty, ds)
mod3 <- lm(health ~ sex + faculty, ds)
summary(mod1)
summary(mod2)
summary(mod3)
Income, happiness, and health are DVs. Sex and Faculty are IVs and they will be used for all regressions.
That was the closest I found
Let me know If I need to clarify my question.
Thanks.
As you have different dependent variables but the same independent, you can form a matrix of these and pass to lm
.
mod = lm(cbind(income, happiness, health) ~ sex + faculty, ds)
And I think broom::tidy
works
library(broom)
tidy(mod)
# response term estimate std.error statistic p.value
# 1 income (Intercept) 1019.35703873 31.0922529 32.7849205 2.779199e-54
# 2 income sex -54.40337314 40.1399258 -1.3553431 1.784559e-01
# 3 income faculty 19.74808081 17.9511206 1.1001030 2.740100e-01
# 4 happiness (Intercept) 5.97334562 0.1675340 35.6545278 1.505026e-57
# 5 happiness sex 0.05345555 0.2162855 0.2471528 8.053124e-01
# 6 happiness faculty -0.02525431 0.0967258 -0.2610918 7.945753e-01
# 7 health (Intercept) 19.76489553 0.5412676 36.5159396 1.741411e-58
# 8 health sex 0.32399380 0.6987735 0.4636607 6.439296e-01
# 9 health faculty 0.10808545 0.3125010 0.3458723 7.301877e-01
Another method is to gather
the dependent variables and use a grouped data frame to fit the models with do
. This is the method explained in the broom and dplyr vignette.
library(tidyverse)
library(broom)
ds <- data.frame(
income = rnorm(100, mean = 1000, sd = 200),
happiness = rnorm(100, mean = 6, sd = 1),
health = rnorm(100, mean = 20, sd = 3),
sex = c(0, 1),
faculty = c(0, 1, 2, 3)
)
ds %>%
gather(dv_name, dv_value, income:health) %>%
group_by(dv_name) %>%
do(tidy(lm(dv_value ~ sex + faculty, data = .)))
#> # A tibble: 9 x 6
#> # Groups: dv_name [3]
#> dv_name term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 happiness (Intercept) 6.25 0.191 32.7 3.14e-54
#> 2 happiness sex 0.163 0.246 0.663 5.09e- 1
#> 3 happiness faculty -0.172 0.110 -1.56 1.23e- 1
#> 4 health (Intercept) 20.1 0.524 38.4 1.95e-60
#> 5 health sex 0.616 0.677 0.909 3.65e- 1
#> 6 health faculty -0.653 0.303 -2.16 3.36e- 2
#> 7 income (Intercept) 1085. 32.8 33.0 1.43e-54
#> 8 income sex -12.9 42.4 -0.304 7.62e- 1
#> 9 income faculty -25.1 19.0 -1.32 1.89e- 1
Created on 2018-08-01 by the reprex package (v0.2.0).
We can loop through the column names that are dependent variables, use paste
to create the formula
to be passed into lm
and get the summary statistics with tidy
(from broom
)
library(tidyverse)
library(broom)
map(names(ds)[1:3], ~
lm(formula(paste0(.x, "~",
paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
tidy)
If we want it in a single data.frame
with a column identifier for dependent variable,
map_df(set_names(names(ds)[1:3]), ~
lm(formula(paste0(.x, "~",
paste(names(ds)[4:5], collapse=" + "))), data = ds) %>%
tidy, .id = "Dep_Variable")