ROC curve for classification from randomForest

2020-06-04 03:11发布

问题:

I am using randomForest package in R platform for classification task.

rf_object<-randomForest(data_matrix, label_factor, cutoff=c(k,1-k))

where k ranges from 0.1 to 0.9.

pred <- predict(rf_object,test_data_matrix)

I have the output from the random forest classifier and I compared it with the labels. So, I have the performance measures like accuracy, MCC, sensitivity, specificity, etc for 9 cutoff points.

Now, I want to plot the ROC curve and obtain the area under the ROC curve to see how good the performance is. Most of the packages in R (like ROCR, pROC) require prediction and labels but I have sensitivity (TPR) and specificity (1-FPR).

Can any one suggest me if the cutoff method is correct or reliable to produce ROC curve? Do you know any way to obtain ROC curve and area under the curve using TPR and FPR?

I also tried to use the following command to train random forest. This way the predictions were continuous and were acceptable to ROCR and pROC packages in R. But, I am not sure if this is correct way to do. Can any one suggest me about this method?

rf_object <- randomForest(data_matrix, label_vector)
pred <- predict(rf_object, test_data_matrix)

Thank you for your time reading my problem! I have spent long time surfing for this. Thank you for your suggestion/advice.

回答1:

Why don't you output class probabilities ? This way, you have a ranking of your predictions and you can directly input that to any ROC package.

m = randomForest(data_matrix, labels)
predict(m,newdata_matrix,type='prob')

Note that, to use randomForest as a classification tool, labels must be a vector of factor.