I was wondering if there is any command that can output the results of a lm model into a data frame in R like outest in SAS.
Any ideas? I am running multiple models and I want the result to look like below -
Model | alpha | Beta | Rsquared | F | df |
model0 | 8.4 | ... | .... | ..| .. |
model1 | ... | ... | .... | ..| .. |
model2 | ... | ... | .... | ..| .. |
The data i have is 'ds' which is -
X1 | X2 | Y1 |
.. | .. | .. |
.. | .. | .. |
.. | .. | .. |
.. | .. | .. |
And my code is a simple lm code -
model0 <- lm(Y1 ~ X1, ds)
model1 <- lm(Y1 ~ 1, ds)
model2 <- lm(Y1 ~ X1 + X2, ds)
I do exactly the same thing. The difficulty here is of course if the models have different number of coefficients - then you would have different number of columns, which is impossible in data.frame. You need to have the same number of columns for each model.
I normally use it for glm
(these code snippets are commented out) but I modified it for lm
for you:
models <- c()
for (i in 1:10) {
y <- rnorm(100) # generate some example data for lm
x <- rnorm(100)
m <- lm(y ~ x)
# in case of glm:
#m <- glm(y ~ x, data = data, family = "quasipoisson")
#overdispersion <- 1/m$df.residual*sum((data$count-fitted(m))^2/fitted(m))
coef <- summary(m)$coef
v.coef <- c(t(coef))
names(v.coef) <- paste(rep(rownames(coef), each = 4), c("coef", "stderr", "t", "p-value"))
v.model_info <- c(r.squared = summary(m)$r.squared, F = summary(m)$fstatistic[1], df.res = summary(m)$df[2])
# in case of glm:
#v.model_info <- c(overdisp = summary(m)$dispersion, res.deviance = m$deviance, df.res = m$df.residual, null.deviance = m$null.deviance, df.null = m$df.null)
v.all <- c(v.coef, v.model_info)
models <- rbind(models, cbind(data.frame(model = paste("model", i, sep = "")), t(v.all)))
}
I prefer to take data from summary(m)
. To bundle the data into data.frame
, you use the cbind
(column bind) and rbind
(row bind) functions.
You can use the coefficients
function:
out = coefficients(lm(mpg ~ wt, mtcars))
out
# (Intercept) wt
# 37.285126 -5.344472
out[1]
# (Intercept)
# 37.28513
or for the group of lm objects:
library(plyr)
out = ldply(list(model0, model1, model2), coefficients)
rownames(out) = sprintf('model%d', 0:2)
(Intercept) wt
model0 37.28513 -5.344472
model1 37.28513 -5.344472
model2 37.28513 -5.344472
To expand my solution to what you need, you need to:
- Find out how to extract the other information you need from an
lm
object.
- Write a custom function which returns a one-row
data.frame
which contains all the information.
- Run it using the
ldply
syntax I showed.