Let's say I have some generic dataset for which an OLS regression is the best choice. So, I generate a model with some first-order terms and decide to use Caret in R for my regression coefficient estimates and error estimates.
In caret, this ends up being:
k10_cv = trainControl(method="cv", number=10)
ols_model = train(Y ~ X1 + X2 + X3, data = my_data, trControl = k10_cv, method = "lm")
From there, I can pull out regression information using summary(ols_model)
and can also pull some more information by just calling ols_model
.
When I just look at ols_model
, is the RMSE/R-square/MAE being calculated via the typical k-fold CV approach? Also, when the model I see in summary(ols_model)
is generated, is this model trained on the entire dataset or is it an average of models generated across each of the folds?
If not, in the interest of trading variance for bias, is there a way to acquire an OLS model within Caret that is trained on one fold at a time?
Here's reproducible data for your example.
1)The
ols_model$results
above is based on the mean of each of the different resampling below:I.e.
2)The model is trained on the whole training set. You can check this with either using
lm
or specifymethod = "none"
for thetrainControl
.Which is identical with
ols_model$finalModel
.