I would like to predict the probability from Logistic Regression model with cross-validation. I know you can get the cross-validation scores, but is it possible to return the values from predict_proba instead of the scores?
# imports
from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import (StratifiedKFold, cross_val_score,
train_test_split)
from sklearn import datasets
# setup data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# setup model
cv = StratifiedKFold(y, 10)
logreg = LogisticRegression()
# cross-validation scores
scores = cross_val_score(logreg, X, y, cv=cv)
# predict probabilities
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
logreg.fit(Xtrain, ytrain)
proba = logreg.predict_proba(Xtest)
There is a function
cross_val_predict
that gives you the predicted values, but there is no such function for "predict_proba" yet. Maybe we could make that an option.This is easy to implement:
This one returns predict_proba. If you need both predict and predict_proba just change
predict
andcombine
arguments:This is now implemented as part of scikit-learn version 0.18. You can pass a 'method' string parameter to the cross_val_predict method. Documentation is here.
Example:
Also note that this is part of the new sklearn.model_selection package so you will need this import:
An easy workaround for this is to create a wrapper class, which for your case would be
and then pass an instance of it as the classifier object to
cross_val_predict