Calculate AUC in R?

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

Page 9 of "AUC: a Better Measure..." seems to require knowing the class labels, and here is an example in MATLAB where I don't understand

R(Actual == 1))

Because R (not to be confused with the R language) is defined a vector but used as a function?

标签： r machine-learning data-mining auc

10条回答

欢心

2楼-- · 2019-01-29 22:34

As mentioned by others, you can compute the AUC using the ROCR package. With the ROCR package you can also plot the ROC curve, lift curve and other model selection measures.

You can compute the AUC directly without using any package by using the fact that the AUC is equal to the probability that a true positive is scored greater than a true negative.

For example, if pos.scores is a vector containing a score of the positive examples, and neg.scores is a vector containing the negative examples then the AUC is approximated by:

> mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T))
[1] 0.7261

will give an approximation of the AUC. You can also estimate the variance of the AUC by bootstrapping:

> aucs = replicate(1000,mean(sample(pos.scores,1000,replace=T) > sample(neg.scores,1000,replace=T)))

0人赞添加讨论(0) 举报

可以哭但决不认输i

3楼-- · 2019-01-29 22:35

With the package pROC you can use the function auc() like this example from the help page:

> data(aSAH)
> 
> # Syntax (response, predictor):
> auc(aSAH$outcome, aSAH$s100b)
Area under the curve: 0.7314

0人赞添加讨论(0) 举报

等我变得足够好

4楼-- · 2019-01-29 22:38

Combining code from ISL 9.6.3 ROC Curves, along with @J. Won.'s answer to this question and a few more places, the following plots the ROC curve and prints the AUC in the bottom right on the plot.

Below probs is a numeric vector of predicted probabilities for binary classification and test$label contains the true labels of the test data.

require(ROCR)
require(pROC)

rocplot <- function(pred, truth, ...) {
  predob = prediction(pred, truth)
  perf = performance(predob, "tpr", "fpr")
  plot(perf, ...)
  area <- auc(truth, pred)
  area <- format(round(area, 4), nsmall = 4)
  text(x=0.8, y=0.1, labels = paste("AUC =", area))

  # the reference x=y line
  segments(x0=0, y0=0, x1=1, y1=1, col="gray", lty=2)
}

rocplot(probs, test$label, col="blue")

This gives a plot like this:

0人赞添加讨论(0) 举报

爷的心禁止访问

5楼-- · 2019-01-29 22:41

The ROCR package will calculate the AUC among other statistics:

auc.tmp <- performance(pred,"auc"); auc <- as.numeric(auc.tmp@y.values)

0人赞添加讨论(0) 举报

Luminary・发光体

6楼-- · 2019-01-29 22:42

You can learn more about AUROC in this blog post by Miron Kursa:

https://mbq.me/blog/augh-roc/

He provides a fast function for AUROC:

# By Miron Kursa https://mbq.me
auroc <- function(score, bool) {
  n1 <- sum(!bool)
  n2 <- sum(bool)
  U  <- sum(rank(score)[!bool]) - n1 * (n1 + 1) / 2
  return(1 - U / n1 / n2)
}

Let's test it:

set.seed(42)
score <- rnorm(1e3)
bool  <- sample(c(TRUE, FALSE), 1e3, replace = TRUE)

pROC::auc(bool, score)
mltools::auc_roc(score, bool)
ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values[[1]]
auroc(score, bool)

0.51371668847094
0.51371668847094
0.51371668847094
0.51371668847094

auroc() is 100 times faster than pROC::auc() and computeAUC().

auroc() is 10 times faster than mltools::auc_roc() and ROCR::performance().

print(microbenchmark(
  pROC::auc(bool, score),
  computeAUC(score[bool], score[!bool]),
  mltools::auc_roc(score, bool),
  ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values,
  auroc(score, bool)
))

Unit: microseconds
                                                             expr       min
                                           pROC::auc(bool, score) 21000.146
                            computeAUC(score[bool], score[!bool]) 11878.605
                                    mltools::auc_roc(score, bool)  5750.651
 ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values  2899.573
                                               auroc(score, bool)   236.531
         lq       mean     median        uq        max neval  cld
 22005.3350 23738.3447 22206.5730 22710.853  32628.347   100    d
 12323.0305 16173.0645 12378.5540 12624.981 233701.511   100   c 
  6186.0245  6495.5158  6325.3955  6573.993  14698.244   100  b  
  3019.6310  3300.1961  3068.0240  3237.534  11995.667   100 ab  
   245.4755   253.1109   251.8505   257.578    300.506   100 a

0人赞添加讨论(0) 举报

成全新的幸福

7楼-- · 2019-01-29 22:44

I usually use the function ROC from the DiagnosisMed package. I like the graph it produces. AUC is returned along with it's confidence interval and it is also mentioned on the graph.

ROC(classLabels,scores,Full=TRUE)

0人赞添加讨论(0) 举报

1 2 下一页

Calculate AUC in R?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间