How do you pull out the p-value (for the significance of the coefficient of the single explanatory variable being non-zero) and R-squared value from a simple linear regression model? For example...
x = cumsum(c(0, runif(100, -1, +1)))
y = cumsum(c(0, runif(100, -1, +1)))
fit = lm(y ~ x)
summary(fit)
I know that summary(fit)
displays the p-value and R-squared value, but I want to be able to stick these into other variables.
While both of the answers above are good, the procedure for extracting parts of objects is more general.
In many cases, functions return lists, and the individual components can be accessed using
str()
which will print the components along with their names. You can then access them using the $ operator, i.e.myobject$componentname
.In the case of lm objects, there are a number of predefined methods one can use such as
coef()
,resid()
,summary()
etc, but you won't always be so lucky.r-squared: You can return the r-squared value directly from the summary object
summary(fit)$r.squared
. Seenames(summary(fit))
for a list of all the items you can extract directly.Model p-value: If you want to obtain the p-value of the overall regression model, this blog post outlines a function to return the p-value:
In the case of a simple regression with one predictor, the model p-value and the p-value for the coefficient will be the same.
Coefficient p-values: If you have more than one predictor, then the above will return the model p-value, and the p-value for coefficients can be extracted using:
Alternatively, you can grab the p-value of coefficients from the
anova(fit)
object in a similar fashion to the summary object above.Notice that
summary(fit)
generates an object with all the information you need. The beta, se, t and p vectors are stored in it. Get the p-values by selecting the 4th column of the coefficients matrix (stored in the summary object):Try
str(summary(fit))
to see all the info that this object contains.Edit: I had misread Chase's answer which basically tells you how to get to what I give here.
Another option is to use the cor.test function, instead of lm:
I cam across this question while exploring suggested solutions for a similar problem; I presume that for future reference it may be worthwhile to update the available list of answer with a solution utilising the
broom
package.Sample code
Results
Side notes
I find the
glance
function useful as it neatly summarises the useful values. As an added benefit the results are stored as adata.frame
which makes further manipulation easy:Extension of @Vincent 's answer:
For
lm()
generated models:For
gls()
generated models:To isolate an individual p-value itself, you'd add a row number to the code:
For example to access the p-value of the intercept in both model summaries:
Note, you can replace the column number with the column name in each of the above instances:
If you're still unsure of how to access a value form the summary table use
str()
to figure out the structure of the summary table: