How does Caret generate an OLS model with K-fold c

Let's say I have some generic dataset for which an OLS regression is the best choice. So, I generate a model with some first-order terms and decide to use Caret in R for my regression coefficient estimates and error estimates.

In caret, this ends up being:

k10_cv = trainControl(method="cv", number=10)
ols_model = train(Y ~ X1 + X2 + X3, data = my_data, trControl = k10_cv, method = "lm")

From there, I can pull out regression information using summary(ols_model) and can also pull some more information by just calling ols_model.

When I just look at ols_model, is the RMSE/R-square/MAE being calculated via the typical k-fold CV approach? Also, when the model I see in summary(ols_model) is generated, is this model trained on the entire dataset or is it an average of models generated across each of the folds?

If not, in the interest of trading variance for bias, is there a way to acquire an OLS model within Caret that is trained on one fold at a time?

标签： r linear-regression cross-validation r-caret

1条回答

该账号已被封号

2楼-- · 2019-08-15 18:51

Here's reproducible data for your example.

library("caret")
my_data <- iris

k10_cv <- trainControl(method="cv", number=10)

set.seed(100)
ols_model <- train(Sepal.Length ~  Sepal.Width + Petal.Length + Petal.Width,
                  data = my_data, trControl = k10_cv, method = "lm")


> ols_model$results
  intercept      RMSE  Rsquared       MAE     RMSESD RsquaredSD      MAESD
1      TRUE 0.3173942 0.8610242 0.2582343 0.03881222 0.04784331 0.02960042

1)The ols_model$results above is based on the mean of each of the different resampling below:

> (ols_model$resample)
        RMSE  Rsquared       MAE Resample
1  0.3386472 0.8954600 0.2503482   Fold01
2  0.3154519 0.8831588 0.2815940   Fold02
3  0.3167943 0.8904550 0.2441537   Fold03
4  0.2644717 0.9085548 0.2145686   Fold04
5  0.3769947 0.8269794 0.3070733   Fold05
6  0.3720051 0.7792611 0.2746565   Fold06
7  0.3258501 0.8095141 0.2647466   Fold07
8  0.2962375 0.8530810 0.2731445   Fold08
9  0.3059100 0.8351535 0.2611982   Fold09
10 0.2615792 0.9286246 0.2108592   Fold10

I.e.

> mean(ols_model$resample$RMSE)==ols_model$results$RMSE
[1] TRUE

2)The model is trained on the whole training set. You can check this with either using lm or specify method = "none" for the trainControl.

 coef(lm(Sepal.Length ~  Sepal.Width + Petal.Length + Petal.Width, data = my_data))
 (Intercept)  Sepal.Width Petal.Length  Petal.Width 
   1.8559975    0.6508372    0.7091320   -0.5564827

Which is identical with ols_model$finalModel.

0人赞添加讨论(0) 举报

How does Caret generate an OLS model with K-fold c

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间