I ran a logistic regression model and made predictions of the logit values. I used this to get the points on the ROC curve:
from sklearn import metrics
fpr, tpr, thresholds = metrics.roc_curve(Y_test,p)
I know metrics.roc_auc_score
gives the area under the ROC curve. Can anyone tell me what command will find the optimal cut-off point (threshold value)?
Though its late to answer, thought might be helpful. You can do this using the
epi
package in R (here!), however I could not find similar package or example in python.The optimal cut off point would be where
true positive rate
is high and thefalse positive rate
is low. Based on this logic, I have pulled an example below to find optimal threshold.Python code:
The optimal cut off point is 0.317628, so anything above this can be labeled as 1 else 0. You can see from the output/chart that where tpr is crossing 1-fpr the tpr is 63%, fpr is 36% and tpr-(1-fpr) is nearest to zero in the current example.
Output:
Hope this is helpful.
Edit
To simplify and bring in re-usability, I have made a function to find the optimal probability cutoff point.
Python Code:
Given tpr, fpr, thresholds from your question, the answer for the optimal threshold is just:
Vanilla Python Implementation of Youden's J-Score
The post of cgnorthcutt
is almost correct. The abs value must be taken.
According to the reference mentioned --> http://www.medicalbiostatistics.com/roccurve.pdf p.6 I ve found another possibility:
opt_idx = np.argmin(np.sqrt(np.square(1-tpr) + np.square(fpr)))