I am wondering why predictions from 'Fold1' are actually predictions from the second fold in my predefined folds. I attach an example of what I mean.
# load the library
library(caret)
# load the cars dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)
model$pred$rowIndex[model$pred$Resample == 'Fold1'] %in% cv_folds[[2]]
The Resample data of
'Fold1'
are the records which are not incv_folds[[1]]
. These records are contained incv_folds
2-5. This is correct as you are running a 5-fold cross-validation. Resample Fold 1 is tested against training the model on folds 2-5. Resample fold 2 is tested against training on folds 1, 3-5, and so on.In summary: The predictions in
Fold1
are the test predictions from training a model on cv_folds 2-5.Edit: based on comment
All the needed info is in the model$pred table. I added a bit of code for clarification:
Basicly what you need for further stacking with the predictions are the
pred
androwIndex
columns from the model$pred table.The rowIndex refers to the row from the original data. So rowIndex 610 refers to record 610 in the cars dataset. You can compare that the data in obs, which is the value of the Price column from the cars dataset.