How to plot a ROC curve using ROCR package in r, *

How to plot a ROC curve using ROCR package in r, with only a classification contingency table?

I have a contingency table where the true positive, false positive.. etc. all the rated can be computed. I have 500 replications, therefore 500 tables. But, I can not generate a prediction data indicating each single case of estimating probability and the truth. How can I get a curve without the individual data. Below is the package instruction used.

## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)

标签： r plot classification roc

1条回答

兄弟一词,经得起流年.

2楼-- · 2020-07-11 07:28

You cannot generate the full ROC curve with a single contingency table because a contingency table provides only a single sensitivity/specificity pair (for whatever predictive cutoff was used to generate the contingency table).

If you had many contingency tables that were generated with different cutoffs, you would be able to approximate the ROC curve (basically it will be a linear interpolation between the sensitivity/specificity values in your contingency tables). As an example, let's consider predicting whether a flower is versicolor in the iris dataset using logistic regression:

iris$isv <- as.numeric(iris$Species == "versicolor")
mod <- glm(isv~Sepal.Length+Sepal.Width, data=iris, family="binomial")

We could use the standard ROCR code to compute the ROC curve for this model:

library(ROCR)
pred1 <- prediction(predict(mod), iris$isv)
perf1 <- performance(pred1,"tpr","fpr")
plot(perf1)

enter image description here

Now let's assume that instead of mod all we have is contingency tables with a number of cutoffs values for predictions:

tables <- lapply(seq(0, 1, .1), function(x) table(iris$isv, factor(predict(mod, type="response") >= x, levels=c(F, T))))

# Predict TRUE if predicted probability at least 0
tables[[1]]
#     FALSE TRUE
#   0     0  100
#   1     0   50

# Predict TRUE if predicted probability at least 0.5
tables[[6]]
#     FALSE TRUE
#   0    86   14
#   1    29   21

# Predict TRUE if predicted probability at least 1
tables[[11]]
#     FALSE TRUE
#   0   100    0
#   1    50    0

From one table to the next some predictions changed from TRUE to FALSE due to the increased cutoff, and by comparing column 1 of the successive table we can determine which of these represent true negative and false negative predictions. Iterating through our ordered list of contingency tables we can create fake predicted value/outcome pairs that we can pass to ROCR, ensuring that we match the sensitivity/specificity for each contingency table.

fake.info <- do.call(rbind, lapply(1:(length(tables)-1), function(idx) {
  true.neg <- tables[[idx+1]][1,1] - tables[[idx]][1,1]
  false.neg <- tables[[idx+1]][2,1] - tables[[idx]][2,1]
  if (true.neg <= 0 & false.neg <= 0) {
    return(NULL)
  } else {
    return(data.frame(fake.pred=idx,
                      outcome=rep(c(0, 1), times=c(true.neg, false.neg))))
  }
}))

Now we can pass the faked predictions to ROCR as usual:

pred2 <- prediction(fake.info$fake.pred, fake.info$outcome)
perf2 <- performance(pred2,"tpr","fpr")
plot(perf2)

enter image description here

Basically what we have done is a linear interpolation of the points that we do have on the ROC curve. If you had contingency tables for many cutoffs you could more closely approximate the true ROC curve. If you don't have a wide range of cutoffs you can't hope to accurately reproduce the full ROC curve.

0人赞添加讨论(0) 举报

How to plot a ROC curve using ROCR package in r, *

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间