How do you pull out the p-value (for the significance of the coefficient of the single explanatory variable being non-zero) and R-squared value from a simple linear regression model? For example...
x = cumsum(c(0, runif(100, -1, +1)))
y = cumsum(c(0, runif(100, -1, +1)))
fit = lm(y ~ x)
summary(fit)
I know that summary(fit)
displays the p-value and R-squared value, but I want to be able to stick these into other variables.
You can see the structure of the object returned by
summary()
by callingstr(summary(fit))
. Each piece can be accessed using$
. The p-value for the F statistic is more easily had from the object returned byanova
.Concisely, you can do this:
This is the easiest way to pull the p-values:
Use:
where
num
is a number which denotes the row of the coefficients matrix. It will depend on how many features you have in your model and which one you want to pull out the p-value for. For example, if you have only one variable there will be one p-value for the intercept which will be [1,4] and the next one for your actual variable which will be [2,4]. So yournum
will be 2.I used this lmp function quite a lot of times.
And at one point I decided to add new features to enhance data analysis. I am not in expert in R or statistics but people are usually looking at different information of a linear regression :
Let's have an example. You have here
Here a reproducible example with different variables:
There is certainly a faster solution than this function but it works.