Does the cross_val_predict
(see doc, v0.18) with k-fold method as shown in the code below calculate accuracy for each fold and average them finally or not?
cv = KFold(len(labels), n_folds=20)
clf = SVC()
ypred = cross_val_predict(clf, td, labels, cv=cv)
accuracy = accuracy_score(labels, ypred)
print accuracy
No, it does not!
According to cross validation doc page,
cross_val_predict
does not return any scores but only the labels based on a certain strategy which is described here:And therefore by calling
accuracy_score(labels, ypred)
you are just calculating accuracy scores of labels predicted by aforementioned particular strategy compared to the true labels. This again is specified in the same documentation page:If you need accuracy scores of different folds you should try:
and then for the mean accuracy of all folds use
scores.mean()
:Edit:
For calculating
Cohen Kappa coefficient
and confusion matrix I assumed you mean kappa coefficient and confusion matrix between true labels and each fold's predicted labels:Edit 2:
What does
cross_val_predict
return?KFold splits the data to k parts and then for i=1..k iterations does this: takes all parts except i'th part as the training data, fits the model with them and then predicts labels for i'th part (test data). In each iteration, label of i'th part of data gets predicted. In the end cross_val_predict merges all partially predicted labels and returns them as a whole.
This code shows this process step by step:
The result is:
I would like to add an option for a quick and easy answer, above what the previous developers contributed.
If you take micro average of F1 you will essentially be getting the accuracy rate. So for example that would be:
This works mathematically, since the micro average gives you the weighted average of the confusion matrix.
Good luck.
As it is written in the documenattion sklearn.model_selection.cross_val_predict :
As you can see from the code of
cross_val_predict
on github, the function computes for each fold the predictions and concatenates them. The predictions are made based on model learned from other folds.Here is a combination of your code and the example provided in the code
Finally, to answer your question: "No, the accuracy is not averaged for each fold"