Im using Xgboost implementation on sklearn for a kaggle's competition. However, im getting this 'warning' message :
$ python Script1.py /home/sky/private/virtualenv15.0.1dev/myVE/local/lib/python2.7/site-packages/sklearn/cross_validation.py:516:
Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning)
According to another question on stackoverflow : "Check that you have at least 3 samples per class to be able to do StratifiedKFold cross validation with k == 3 (I think this is the default CV used by GridSearchCV for classification)."
And well, i dont have at least 3 samples per class.
So my questions are:
a)what are the alternatives?
b) Why can't i use cross validation?
c) What can i use instead?
...
param_test1 = {
'max_depth': range(3, 10, 2),
'min_child_weight': range(1, 6, 2)
}
grid_search = GridSearchCV(
estimator=
XGBClassifier(
learning_rate=0.1,
n_estimators=3000,
max_depth=15,
min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective='multi:softmax',
nthread=42,
scale_pos_weight=1,
seed=27),
param_grid=param_test1, scoring='roc_auc', n_jobs=42, iid=False, cv=None, verbose=1)
...
grid_search.fit(train_x, place_id)
References:
One-shot learning with scikit-learn
Using a support vector classifier with polynomial kernel in scikit-learn