I'm running a regression in the form
reg=lm(y ~ x1+x2+x3+z1,data=mydata)
In the place of the last term, z1
, I want to loop through a set of different variables, z1
through z10
, running a regression for each with it as the last term. E.g. in second run I want to use
reg=lm(y ~ x1+x2+x3+z2,data=mydata)
in 3rd run:
reg=lm(y ~ x1+x2+x3+z3,data=mydata)
How can I automate this by looping through the list of z-variables?
Depending on what your final goal is, it can be much faster to fit a base model, update it with
add1
, and extract the F-test/AIC you want:See also
?update
for refitting the model.With this dummy data:
You could get your list of two
lm
objects this way:Which iterates through those two columns and substitutes them as arguments into the
lm
call.As Alex notes below, it's preferable to pass the names through the formula, rather than the actual data columns as I have done here.
Here's a different approach using packages from the dplyr / tidyr family. It restructures the data to a long form, then uses
group_by()
from the dplyr package instead oflapply()
:This converts the data to a long format using
gather
, where the z-values occupy the same column.use_series()
from the magrittr package return the list oflm
objects instead of adata.frame
. If you load the broom package, you can extract the model coefficients in this pipeline of code:Data:
While what Sam has provided works and is a good solution, I would personally prefer to go about it slightly differently. His answer has already been accepted, so I'm just posting this for the sake of completeness.
Rather than looping over the actual columns of the data frame, this loops only over the string of names. This provides some speed improvements as fewer things are copied between iterations.
The difference is fairly small for this toy example, but as the number of observations and predictors increases, the difference will likely become more pronounced.