I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly.
The documentation (http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly.
Here is my code to help demonstrate what I see:
# score the model using cross_val_score
rf = RandomForestClassifier(n_estimators=150,
min_samples_leaf=4,
min_samples_split=3,
n_jobs=-1)
scores = cross_val_score(rf, X, y, cv=3, scoring='roc_auc')
print scores
array([ 0.9649023 , 0.96242235, 0.9503313 ])
# do a train_test_split, fit the model, and score with roc_auc_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
rf.fit(X_train, y_train)
print roc_auc_score(y_test, rf.predict(X_test))
0.84634039111363313 # quite a bit different than the scores above!
I feel like I am missing something very simple here -- most likely a mistake in how I am implementing/interpreting one of the scoring metrics.
Can anyone shed any light on the reason for the discrepancy between the two scoring metrics?
This is because you supplied predicted y's instead of the probability in roc_auc_score. This function takes a score, not the classified label. Try instead to do this:
It should give a similar result to previous result from cross_val_score. Refer to this post for more info.
Ran into this problem myself and after digging a bit found the answer. Sharing for the love.
There is actually two and a half problems.
roc_auc_score
(using thepredict_proba()
method). BUT, some estimators (like SVC) does not have apredict_proba()
method, you then use thedecision_function()
method.Here's a full example:
Using two estimators
Split the train/test set. But keep it into a variable we can reuse.
Feed it to
GridSearchCV
and save scores. Note we are passingfourfold
.Feed it to
cross_val_score
and save scores.Sometimes, you want to loop and compute several different scores, so this is what you use.
Do we have the same scores across the board?
BUT, sometimes our estimator does not have a
predict_proba()
method. So, according to this example, we do this:I just ran into a similar issue here. The key takeaway there was that
cross_val_score
uses theKFold
strategy with default parameters for making the train-test splits, which means splits into consecutive chunks rather than shuffling.train_test_split
on the other hand does a shuffled split.The solution is to make the split strategy explicit and specify shuffling, like this: