Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?
Page 9 of "AUC: a Better Measure..." seems to require knowing the class labels, and here is an example in MATLAB where I don't understand
R(Actual == 1))
Because R (not to be confused with the R language) is defined a vector but used as a function?
As mentioned by others, you can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.
You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.
For example, if
pos.scores
is a vector containing a score of the positive examples, andneg.scores
is a vector containing the negative examples then the AUC is approximated by:will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:
With the package
pROC
you can use the functionauc()
like this example from the help page:Combining code from ISL 9.6.3 ROC Curves, along with @J. Won.'s answer to this question and a few more places, the following plots the ROC curve and prints the AUC in the bottom right on the plot.
Below
probs
is a numeric vector of predicted probabilities for binary classification andtest$label
contains the true labels of the test data.This gives a plot like this:
The ROCR package will calculate the AUC among other statistics:
You can learn more about AUROC in this blog post by Miron Kursa:
https://mbq.me/blog/augh-roc/
He provides a fast function for AUROC:
Let's test it:
auroc()
is 100 times faster thanpROC::auc()
andcomputeAUC()
.auroc()
is 10 times faster thanmltools::auc_roc()
andROCR::performance()
.I usually use the function ROC from the DiagnosisMed package. I like the graph it produces. AUC is returned along with it's confidence interval and it is also mentioned on the graph.