I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly.
The documentation (http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter) indicates that specifying scoring='roc_auc' will use the sklearn.metrics.roc_auc_score. However, when I implement GridSearchCV or cross_val_score with scoring='roc_auc' I receive very different numbers that when I call roc_auc_score directly.
Here is my code to help demonstrate what I see:
# score the model using cross_val_score
rf = RandomForestClassifier(n_estimators=150,
min_samples_leaf=4,
min_samples_split=3,
n_jobs=-1)
scores = cross_val_score(rf, X, y, cv=3, scoring='roc_auc')
print scores
array([ 0.9649023 , 0.96242235, 0.9503313 ])
# do a train_test_split, fit the model, and score with roc_auc_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
rf.fit(X_train, y_train)
print roc_auc_score(y_test, rf.predict(X_test))
0.84634039111363313 # quite a bit different than the scores above!
I feel like I am missing something very simple here -- most likely a mistake in how I am implementing/interpreting one of the scoring metrics.
Can anyone shed any light on the reason for the discrepancy between the two scoring metrics?
This is because you supplied predicted y's instead of the probability in roc_auc_score. This function takes a score, not the classified label. Try instead to do this:
print roc_auc_score(y_test, rf.predict_proba(X_test)[:,1])
It should give a similar result to previous result from cross_val_score. Refer to this post for more info.
I just ran into a similar issue here. The key takeaway there was that cross_val_score
uses the KFold
strategy with default parameters for making the train-test splits, which means splits into consecutive chunks rather than shuffling. train_test_split
on the other hand does a shuffled split.
The solution is to make the split strategy explicit and specify shuffling, like this:
shuffle = cross_validation.KFold(len(X), n_folds=3, shuffle=True)
scores = cross_val_score(rf, X, y, cv=shuffle, scoring='roc_auc')
Ran into this problem myself and after digging a bit found the answer. Sharing for the love.
There is actually two and a half problems.
- you need to use the same Kfold to compare scores (the same split of the train/test);
- you need to feed the probabilities into the
roc_auc_score
(using the predict_proba()
method). BUT, some estimators (like SVC) does not have a predict_proba()
method, you then use the decision_function()
method.
Here's a full example:
# Let's use the Digit dataset
digits = load_digits(n_class=4)
X,y = digits.data, digits.target
y[y==2] = 0 # Increase problem dificulty
y[y==3] = 1 # even more
Using two estimators
LR = LogisticRegression()
SVM = LinearSVC()
Split the train/test set. But keep it into a variable we can reuse.
fourfold = StratifiedKFold(n_splits=4, random_state=4)
Feed it to GridSearchCV
and save scores. Note we are passing fourfold
.
gs = GridSearchCV(LR, param_grid={}, cv=fourfold, scoring='roc_auc', return_train_score=True)
gs.fit(X,y)
gs_scores = np.array([gs.cv_results_[k][0] for k in gskeys])
Feed it to cross_val_score
and save scores.
cv_scores = cross_val_score(LR, X, y, cv=fourfold, scoring='roc_auc')
Sometimes, you want to loop and compute several different scores, so this is what you use.
loop_scores = list()
for idx_train, idx_test in fourfold.split(X, y):
X_train, y_train, X_test, y_test = X[idx_train], y[idx_train], X[idx_test], y[idx_test]
LR.fit(X_train, y_train)
y_prob = LR.predict_proba(X_test)
auc = roc_auc_score(y_test, y_prob[:,1])
loop_scores.append(auc)
Do we have the same scores across the board?
print [((a==b) and (b==c)) for a,b,c in zip(gs_scores,cv_scores,loop_scores)]
>>> [True, True, True, True]
BUT, sometimes our estimator does not have a
predict_proba()
method. So, according to this example, we do this:
for idx_train, idx_test in fourfold.split(X, y):
X_train, y_train, X_test, y_test = X[idx_train], y[idx_train], X[idx_test], y[idx_test]
SVM.fit(X_train, y_train)
y_prob = SVM.decision_function(X_test)
prob_pos = (y_prob - y_prob.min()) / (y_prob.max() - y_prob.min())
auc = roc_auc_score(y_test, prob_pos)