import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc , roc_auc_score
import numpy as np
correct_classification = np.array([0,1])
predicted_classification = np.array([1,1])
false_positive_rate, true_positive_rate, tresholds = roc_curve(correct_classification, predicted_classification)
print(false_positive_rate)
print(true_positive_rate)
From https://en.wikipedia.org/wiki/Sensitivity_and_specificity :
True positive: Sick people correctly identified as sick
False positive: Healthy people incorrectly identified as sick
True negative: Healthy people correctly identified as healthy
False negative: Sick people incorrectly identified as healthy
I'm using these values 0 : sick, 1 : healthy
From https://en.wikipedia.org/wiki/False_positive_rate :
flase positive rate = false positive / (false positive + true negative)
number of false positive : 0 number of true negative : 1
therefore false positive rate = 0 / 0 + 1 = 0
Reading the return value for roc_curve (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) :
fpr : array, shape = [>2]
Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i].
tpr : array, shape = [>2]
Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i].
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.
How is this a differing value to my manual calculation of false positive rate ? How are thresholds set ? Some mode information on thresholds is provided here : https://datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy but I'm confused as to how it fits with this implementation ?