Using sklearn , I want to have 3 splits (i.e. n_splits = 3)in the sample dataset and have a Train/Test ratio as 70:30. I'm able split the set into 3 folds but not able to define the test size (similar to train_test_split method).Is there a way to do define test sample size in StratifiedKFold ?
from sklearn.model_selection import StratifiedKFold as SKF
skf = SKF(n_splits=3)
skf.get_n_splits(X, y)
for train_index, test_index in skf.split(X, y):
# Loops over 3 iterations to have Train test stratified split
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
StratifiedKFold
does by definition a K-fold split. This is, the iterator returned will yield (K-1
) sets for training while1
set for testing.K
is controlled byn_splits
, and thus, it does create groups ofn_samples/K
, and use all combinations ofK-1
for training/testing. Refer to wikipedia or google K-fold cross-validation for more info about it.In short, the size of the test set will be
1/K
(i.e.1/n_splits
), so you can tune that parameter to control the test size (e.g.n_splits=3
will have test split of size1/3 = 33%
of your data). However,StratifiedKFold
will iterate overK
groups ofK-1
, and might not be what you want.Having said that, you might be interested in StratifiedShuffleSplit, which returns just configurable number of splits and train/test ratio. If you just want a single split, you can tune
n_splits=1
and yet keeptest_size=0.3
(or whatever ratio you want).