Combining cbind and paste in linear model

2019-06-22 08:22发布

问题:

I would like to know how can I come up with a lm formula syntax that would enable me to use paste together with cbind for multiple multivariate regression.

Example

In my model I have a set of variables, which corresponds to the primitive example below:

data(mtcars)
depVars <- paste("mpg", "disp")
indepVars <- paste("qsec", "wt", "drat")

Problem

I would like to create a model with my depVars and indepVars. The model, typed by hand, would look like that:

modExmple <- lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)

I'm interested in generating the same formula without referring to variable names and only using depVars and indepVars vectors defined above.


Attempt 1

For example, what I had on mind would correspond to:

mod1 <- lm(formula = formula(paste(cbind(paste(depVars, collapse = ",")), " ~ ",
                                   indepVars)), data = mtcars)

Attempt 2

I tried this as well:

mod2 <- lm(formula = formula(cbind(depVars), paste(" ~ ",
                                                   paste(indepVars, 
                                                         collapse = " + "))),
           data = mtcars)

Side notes

  • I found a number of good examples on how to use paste with formula but I would like to know how I can combine with cbind.
  • This is mostly a syntax a question; in my real data I've a number of variables I would like to introduce to the model and making use of the previously generated vector is more parsimonious and makes the code more presentable. In effect, I'm only interested in creating a formula object that would contain cbind with variable names corresponding to one vector and the remaining variables corresponding to another vector.
  • In a word, I want to arrive at the formula in modExample without having to type variable names.

回答1:

Think it works.

data(mtcars)
depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")

lm(formula(paste('cbind(',
                 paste(depVars, collapse = ','),
                 ') ~ ',
                 paste(indepVars, collapse = '+'))), data = mtcars)


回答2:

All the solutions below use these definitions:

depVars <- c("mpg", "disp")
indepVars <- c("qsec", "wt", "drat")

1) character string formula Create a character string representing the formula and then run lm using do.call. Note that the the formula shown in the output displays correctly and is written out.

fo <- sprintf("cbind(%s) ~ %s", toString(depVars), paste(indepVars, collapse = "+"))
do.call("lm", list(fo, quote(mtcars)))

giving:

Call:
lm(formula = "cbind(mpg, disp) ~ qsec+wt+drat", data = mtcars)

Coefficients:
             mpg       disp    
(Intercept)   11.3945  452.3407
qsec           0.9462  -20.3504
wt            -4.3978   89.9782
drat           1.6561  -41.1148

1a) This would also work:

fo <- sprintf("cbind(%s) ~.", toString(depVars))
do.call("lm", list(fo, quote(mtcars[c(depVars, indepVars)])))

giving:

Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars[c(depVars, 
    indepVars)])

Coefficients:
             mpg       disp    
(Intercept)   11.3945  452.3407
qsec           0.9462  -20.3504
wt            -4.3978   89.9782
drat           1.6561  -41.1148

2) reformulate @akrun and @Konrad, in comments below the question suggest using reformulate. This approach produces a "formula" object whereas the ones above produce a character string as the formula. (If this were desired for the prior solutions above it would be possible using fo <- formula(fo) .) Note that it is important that the response argument to reformulate be a call object and not a character string or else reformulate will interpret the character string as the name of a single variable.

fo <- reformulate(indepVars, parse(text = sprintf("cbind(%s)", toString(depVars)))[[1]])
do.call("lm", list(fo, quote(mtcars)))

giving:

Call:
lm(formula = cbind(mpg, disp) ~ qsec + wt + drat, data = mtcars)

Coefficients:
             mpg       disp    
(Intercept)   11.3945  452.3407
qsec           0.9462  -20.3504
wt            -4.3978   89.9782
drat           1.6561  -41.1148

3) lm.fit Another way that does not use a formula at all is:

m <- as.matrix(mtcars)
fit <- lm.fit(cbind(1, m[, indepVars]), m[, depVars])

The output is a list with these components:

> names(fit)
[1] "coefficients"  "residuals"     "effects"       "rank"         
[5] "fitted.values" "assign"        "qr"            "df.residual"