How to succinctly write a formula with many variab

Suppose I have a response variable and a data containing three covariates (as a toy example):

y = c(1,4,6)
d = data.frame(x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))

I want to fit a linear regression to the data:

fit = lm(y ~ d$x1 + d$x2 + d$y2)

Is there a way to write the formula, so that I don't have to write out each individual covariate? For example, something like

fit = lm(y ~ d)

(I want each variable in the data frame to be a covariate.) I'm asking because I actually have 50 variables in my data frame, so I want to avoid writing out x1 + x2 + x3 + etc.

标签： r dataframe glm lm

6条回答

若你有天会懂

2楼-- · 2019-01-01 02:08

There is a special identifier that one can use in a formula to mean all the variables, it is the . identifier.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

You can also do things like this, to use all variables bar one:

mod <- lm(y ~ . - x3, data = d)

Technically, . means all variables not already mentioned in the formula. For example

lm(y ~ x1 * x2 + ., data = d)

where . would only reference x3 as x1 and x2 are already in the formula.

0人赞添加讨论(0) 举报

步步皆殇っ

3楼-- · 2019-01-01 02:21

An extension of juba's method is to use reformulate, a function which is explicitly designed for such a task.

## Create a formula for a model with a large number of variables:
xnam <- paste("x", 1:25, sep="")

reformulate(xnam, "y")
y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + 
    x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 + 
    x22 + x23 + x24 + x25

For the example in the OP, the easiest solution here would be

# add y variable to data.frame d
d <- cbind(y, d)
reformulate(names(d)[-1], names(d[1]))
y ~ x1 + x2 + x3

mod <- lm(reformulate(names(d)[-1], names(d[1])), data=d)

Note that adding the dependent variable to the data.frame in d <- cbind(y, d) is preferred not only because it allows for the use of reformulate, but also because it allows for future use of the lm object in functions like predict.

0人赞添加讨论(0) 举报

孤独寂梦人

4楼-- · 2019-01-01 02:24

Yes of course, just add the response y as first column in the dataframe and call lm() on it:

d2<-data.frame(y,d)
> d2
  y x1 x2 x3
1 1  4  3  4
2 4 -1  9 -4
3 6  3  8 -2
> lm(d2)

Call:
lm(formula = d2)

Coefficients:
(Intercept)           x1           x2           x3  
    -5.6316       0.7895       1.1579           NA

Also, my information about R points out that assignment with <- is recommended over =.

0人赞添加讨论(0) 举报

听够珍惜

5楼-- · 2019-01-01 02:26

I build this solution, reformulate does not take care if variable names have white spaces.

add_backticks = function(x) {
    paste0("`", x, "`")
}

x_lm_formula = function(x) {
    paste(add_backticks(x), collapse = " + ")
}

build_lm_formula = function(x, y){
    if (length(y)>1){
        stop("y needs to be just one variable")
    }
    as.formula(        
        paste0("`",y,"`", " ~ ", x_lm_formula(x))
    )
}

# Example
df <- data.frame(
    y = c(1,4,6), 
    x1 = c(4,-1,3), 
    x2 = c(3,9,8), 
    x3 = c(4,-4,-2)
    )

# Model Specification
columns = colnames(df)
y_cols = columns[1]
x_cols = columns[2:length(columns)]
formula = build_lm_formula(x_cols, y_cols)
formula
# output
# "`y` ~ `x1` + `x2` + `x3`"

# Run Model
lm(formula = formula, data = df)
# output
Call:
    lm(formula = formula, data = df)

Coefficients:
    (Intercept)           x1           x2           x3  
        -5.6316       0.7895       1.1579           NA

```

0人赞添加讨论(0) 举报

刘海飞了

6楼-- · 2019-01-01 02:34

A slightly different approach is to create your formula from a string. In the formula help page you will find the following example :

## Create a formula for a model with a large number of variables:
xnam <- paste("x", 1:25, sep="")
fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+")))

Then if you look at the generated formula, you will get :

R> fmla
y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10 + x11 + 
    x12 + x13 + x14 + x15 + x16 + x17 + x18 + x19 + x20 + x21 + 
    x22 + x23 + x24 + x25

0人赞添加讨论(0) 举报

人气声优

7楼-- · 2019-01-01 02:35

You can check the package leaps and in particular the function regsubsets() functions for model selection. As stated in the documentation:

Model selection by exhaustive search, forward or backward stepwise, or sequential replacement

0人赞添加讨论(0) 举报

How to succinctly write a formula with many variab

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间