Test set and train set for each fold in Caret cros

2019-07-13 00:51发布

问题:

I tried to understand the 5 fold cross validation algorithm in Caret package but I could not find out how to get train set and test set for each fold and I also could not find this from the similar suggested questions. Imagine if I want to do cross validation by random forest method, I do the following:

set.seed(12)
train_control <- trainControl(method="cv", number=5,savePredictions = TRUE)
rfmodel <- train(Species~., data=iris, trControl=train_control, method="rf")
first_holdout <- subset(rfmodel$pred, Resample == "Fold1")
str(first_holdout)
'data.frame':   90 obs. of  5 variables:
$ pred    : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1     
$ obs     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 
$ rowIndex: int  2 3 9 11 25 29 35 36 41 50 ...
$ mtry    : num  2 2 2 2 2 2 2 2 2 2 ...
$ Resample: chr  "Fold1" "Fold1" "Fold1" "Fold1" ...

Are these 90 observations in Fold1 used as training set? If yes then where is the test set for this fold?

回答1:

 str(rfmodel)

Model performed stores everything in the below form. control in the below stores the indexes for samples that went to Train and respective hold outs in index and indexOut.

 names(rfmodel)
 #  [1] "method"       "modelInfo"    "modelType"    "results"      "pred"        
 #  [6] "bestTune"     "call"         "dots"         "metric"       "control"     
 # [11] "finalModel"   "preProcess"   "trainingData" "resample"     "resampledCM" 
 # [16] "perfNames"    "maximize"     "yLimits"      "times"        "levels"      
 # [21] "terms"        "coefnames"    "xlevels" 

Path to indexes of Train and Hold Out samples

 # Indexes of Hold Out Sets
 rfmodel$control$indexOut

 # Indexes of Train Sets for above hold outs
 rfmodel$control$index