I'm trying to evaluate multiple machine learning algorithms with sklearn for a couple of metrics (accuracy, recall, precision and maybe more).
For what I understood from the documentation here and from the source code(I'm using sklearn 0.17), the cross_val_score function only receives one scorer for each execution. So for calculating multiple scores, I have to :
- Execute multiple times
Implement my (time consuming and error prone) scorer
I've executed multiple times with this code :
from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.tree import DecisionTreeClassifier from sklearn.cross_validation import cross_val_score import time from sklearn.datasets import load_iris iris = load_iris() models = [GaussianNB(), DecisionTreeClassifier(), SVC()] names = ["Naive Bayes", "Decision Tree", "SVM"] for model, name in zip(models, names): print name start = time.time() for score in ["accuracy", "precision", "recall"]: print score, print " : ", print cross_val_score(model, iris.data, iris.target,scoring=score, cv=10).mean() print time.time() - start
And I get this output:
Naive Bayes
accuracy : 0.953333333333
precision : 0.962698412698
recall : 0.953333333333
0.0383198261261
Decision Tree
accuracy : 0.953333333333
precision : 0.958888888889
recall : 0.953333333333
0.0494720935822
SVM
accuracy : 0.98
precision : 0.983333333333
recall : 0.98
0.063080072403
Which is ok, but it's slow for my own data. How can I measure all scores ?
Since the time of writing this post scikit-learn has updated and made my answer obsolete, see the much cleaner solution below
You can write your own scoring function to capture all three pieces of information, however a scoring function for cross validation must only return a single number inscikit-learn
(this is likely for compatibility reasons). Below is an example where each of the scores for each cross validation slice prints to the console, and the returned value is just the sum of the three metrics. If you want to return all these values, you're going to have to make some changes tocross_val_score
(line 1351 of cross_validation.py) and_score
(line 1601 or the same file).Which gives:As of scikit-learn 0.19.0 the solution becomes much easier
Which gives:
I ran over the same problem and I created a module that can support multiple metrics in
cross_val_score
.In order to accomplish what you want with this module, you can write:
You can check and download this module from GitHub. Hope it helps.