I need to apply lm()
to an enlarging subset of my dataframe dat
, while making prediction for the next observation. For example, I am doing:
fit model predict
---------- -------
dat[1:3, ] dat[4, ]
dat[1:4, ] dat[5, ]
. .
. .
dat[-1, ] dat[nrow(dat), ]
I know what I should do for a particular subset (related to this question: predict() and newdata - How does this work?). For example to predict the last row, I do
dat1 = dat[1:(nrow(dat)-1), ]
dat2 = dat[nrow(dat), ]
fit = lm(log(clicks) ~ log(v1) + log(v12), data=dat1)
predict.fit = predict(fit, newdata=dat2, se.fit=TRUE)
How can I do this automatically for all subsets, and potentially extract what I want into a table?
- From
fit
, I'd need thesummary(fit)$adj.r.squared
; - From
predict.fit
I'd needpredict.fit$fit
value.
Thanks.
I just made up some random data to use for this example. I'm calling the object
data
because that was what it was called in the question at the time that I wrote this solution (call it anything you like).(Efficient) Solution
All of the results you want will be stored in the data objects this creates.
For example:
You mentioned in the comments that you'd like a table of results. You can programmatically create tables of results from the 3 types of output files like this:
(Efficient) solution
This is what you can do:
Note I have done several things inside the
bundle
function:subset
argument for selecting a subset to fitmodel = FALSE
to not save model frame hence we save workspaceOverall, there is no obvious loop, but
sapply
is used.p
, the minimum number of data required to fit a model withp
coefficients;nrow(dat) - 1
, as we at least need the final column for prediction.Test
Example data (with 30 "observations")
Applying code above gives
results
(27 rows in total, truncated output for 5 rows)The first column is the adjusted-R.squared value for fitted model, while the second column is the prediction. The first value for
adj.r2
isNaN
, because the first model we fit has 3 coefficients for 3 data points, hence no sensible statistics is available. The same happens tose
as well, as the fitted line has no 0 residuals, so prediction is done without uncertainty.