I am new to R and I want to improve the following script with an *apply
function (I have read about apply
, but I couldn't manage to use it). I want to use lm
function on multiple independent variables (which are columns in a data frame). I used
for (i in (1:3) {
assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))
}
Formula(i)
is defined as
formula=function(x)
{
as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )
}
Thank you.
If I don't get you wrong, you are working with a dataset like this:
x1
,x2
andx3
are covariates, andy1
,y2
,y3
are three independent response. You are trying to fit three linear models:Currently you are using a loop through
y1
,y2
,y3
, fitting one model per time. You hope to speed the process up by replacing thefor
loop withlapply
.You are on the wrong track.
lm()
is an expensive operation. As long as your dataset is not small, the costs offor
loop is negligible. Replacingfor
loop withlapply
gives no performance gains.Since you have the same RHS (right hand side of
~
) for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once.lm
allows this, and you can use:If you check
str(fit)
, you will see that this is not a list of three linear models; instead, it is a single linear model with a single$qr
object, but with multiple LHS. So$coefficients
,$residuals
and$fitted.values
are matrices. The resulting linear model has an additional "mlm" class besides the usual "lm" class. I created a special mlm tag collecting some questions on the theme, summarized by its tag wiki.If you have a lot more covariates, you can avoid typing or pasting formula by using
.
:Caution: Do not write
This will treat
y = y1 + y2 + y3
as a single response. Usecbind()
.Follow-up:
So you are programming your formula, or want to dynamically generate / construct model formulae in the loop. There are many ways to do this, and many Stack Overflow questions are about this. There are commonly two approaches:
reformulate
;paste
/paste0
andformula
/as.formula
.I prefer to
reformulate
for its neatness, however, it does not support multiple LHS in the formula. It also needs some special treatment if you want to transform the LHS. So In the following I would usepaste
solution.For you data frame
df
, you may doA more nice-looking way is to use
sprintf
andtoString
to construct the LHS:Here is an example using
iris
dataset:You can pass this string formula to
lm
, aslm
will automatically coerce it into formula class. Or you may do the coercion yourself usingformula
(oras.formula
):Remark:
This multiple LHS formula is also supported elsewhere in R core:
aggregate
;aov
.