Regression of variables in a dataframe

2019-09-09 21:19发布

问题:

I have a dataframe:

df = data.frame(x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50), x4 = rnorm(50))

I would like to regress each variable versus all the other variables, for instance:

fit1 <- lm(x1 ~ ., data = df)
fit2 <- lm(x2 ~ ., data = df)

etc. (Of course, the real dataframe has a lot more variables).

I tried putting them in a loop, but it didn't work. I also tried using lapply but couldn't produce the desired result either. Does anyone know the trick?

回答1:

You can use reformulate to dynamically build formuals

df = data.frame(x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50), x4 = rnorm(50))

vars <- names(df)
result <- lapply(vars, function(resp) {
    lm(reformulate(".",resp), data=df)
})

alternatively you could use do.call to get "prettier" formauls in each of the models

vars <- names(df)
result <- lapply(vars, function(resp) {
    do.call("lm", list(reformulate(".",resp), data=quote(df)))
})

each of these methods returns a list. You can extract individual models with result[[1]], result[[2]], etc



回答2:

Or you can try this...

df = data.frame(x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50), x4 = rnorm(50))   
models = list()

for (i in (1: ncol(df))){
  formula = paste(colnames(df)[i], "~ .", sep="")
  models[[i]] = lm(formula, data = df) 
}

This will save all models as a list

To retrieve stored models:

eg : model regressed on x4

#retrieve model - replace modelName with the name of the required column
modelName = "x4"
out = models[[which( colnames(df)== modelName )]]

Output :

 Call:
lm(formula = formula, data = df)

Coefficients:
(Intercept)           x1           x2           x3  
   -0.17383      0.07602     -0.09759     -0.23920