Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer.
Thanks in advance.
There is a parameter called 'index' which can let user specified the index to do cross validation.
folds <- 4
cvIndex <- createFolds(factor(training$Y), folds, returnTrain = T)
tc <- trainControl(index = cvIndex,
method = 'cv',
number = folds)
rfFit <- train(Y ~ ., data = training,
method = "rf",
trControl = tc,
maximize = TRUE,
verbose = FALSE, ntree = 1000)