I'm just wondering if this is a legitimate way of calculating classification accuracy:
- obtain precision recall thresholds
- for each threshold binarize the continuous y_scores
- calculate their accuracy from the contingency table (confusion matrix)
return the average accuracy for the thresholds
recall, precision, thresholds = precision_recall_curve(np.array(np_y_true), np.array(np_y_scores)) accuracy = 0 for threshold in thresholds: contingency_table = confusion_matrix(np_y_true, binarize(np_y_scores, threshold=threshold)[0]) accuracy += (float(contingency_table[0][0]) + float(contingency_table[1][1]))/float(np.sum(contingency_table)) print "Classification accuracy is: {}".format(accuracy/len(thresholds))
You are heading into the right direction. The confusion matrix definetly is the right start for computing the accuracy of your classifier. It seems to me that you are aiming at reciever operating characteristics.
The AUC (area under the curve) is a measurement of your classifiers performance. More information and explanation can be found here:
https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it
http://mlwiki.org/index.php/ROC_Analysis
This is my implementation, which you are welcome to improve/comment: