I would like to know if there is a way to plot the average ROC Curve from the cross-validation data of a SVM-RFE model generated with the caret
package.
My results are:
Recursive feature selection
Outer resampling method: Cross-Validated (10 fold, repeated 5 times)
Resampling performance over subset size:
Variables ROC Sens Spec Accuracy Kappa ROCSD SensSD SpecSD AccuracySD KappaSD Selected
1 0.6911 0.0000 1.0000 0.5900 0.0000 0.2186 0.0000 0.0000 0.0303 0.0000
2 0.7600 0.3700 0.8067 0.6280 0.1807 0.1883 0.3182 0.2139 0.1464 0.3295
3 0.7267 0.4233 0.8667 0.6873 0.3012 0.2020 0.3216 0.1905 0.1516 0.3447
4 0.6989 0.3867 0.8600 0.6680 0.2551 0.2130 0.3184 0.1793 0.1458 0.3336
5 0.7000 0.3367 0.8600 0.6473 0.2006 0.2073 0.3359 0.1793 0.1588 0.3672
6 0.7167 0.3833 0.8200 0.6427 0.2105 0.1909 0.3338 0.2539 0.1682 0.3639
7 0.7122 0.3767 0.8333 0.6487 0.2169 0.1784 0.3226 0.2048 0.1642 0.3702
8 0.7144 0.4233 0.7933 0.6440 0.2218 0.2017 0.3454 0.2599 0.1766 0.3770
9 0.8356 0.6533 0.7867 0.7300 0.4363 0.1706 0.3415 0.2498 0.1997 0.4209
10 0.8811 0.6867 0.8200 0.7647 0.5065 0.1650 0.3134 0.2152 0.1949 0.4053 *
11 0.8700 0.6933 0.8133 0.7627 0.5046 0.1697 0.3183 0.2147 0.1971 0.4091
12 0.8678 0.6967 0.7733 0.7407 0.4682 0.1579 0.3153 0.2559
...
The top 5 variables (out of 10):
SumAverage_GLCM_R1SC4NG2, Variance_GLCM_R1SC4NG2, HGZE_GLSZM_R1SC4NG2, LGZE_GLSZM_R1SC4NG2, SZLGE_GLSZM_R1SC4NG2
I have tried with the solution mentioned here: ROC curve from training data in caret
optSize <- svmRFE_NG2$optsize
selectedIndices <- svmRFE_NG2$pred$Variables == optSize
plot.roc(svmRFE_NG2$pred$obs[selectedIndices],
svmRFE_NG2$pred$LUNG[selectedIndices])
But this solution seems not to work (the resulting AUC value is quite different). I have separated the results of the training process into the 50 cross-validation sets, as mentioned in the previous answer, but I do not know what to do next.
resamples<-split(svmRFE_NG2$pred,svmRFE_NG2$pred$Variables)
resamplesFOLD<-split(resamples[[optSize]],resamples[[optSize]]$Resample)
Any ideas?
As you already did you can a) enable
savePredictions = T
in thetrainControl
parameter ofcaret::train
, then, b) from the trained model object, use thepred
variable - which contains all predictions over all partitions and resamples - to compute whichever ROC curve you would like to look at. You now have multiple options of which ROC this can be, e.g.:you could look at all predictions over all partitions and resamples at once:
Or you could do this over individual partitions and/or resamples (which is what you tried above). The following example computes the ROC curve per partition and resample, so with 10 partitions and 5 repeats will result in 50 ROC curves:
Depending on your data and model, the latter will give you certain variance in the resulting ROC curves and AUC values. You can see the same variance in the
AUC
andSD
valuescaret
calculated for your individual partitions and resamples, so this results from your data and model and is correct.BTW: I was using the
pROC::roc
function for calculating the examples above, but you could use any suitable function here. And, when usingcaret::train
obtaining the ROC is always the same, no matter the model type.I know this post is old but I have the same issue trying to understand why I get different results when calculating the ROC value from each resample and when I am calculating the ROC value using all predictions and resamples at once. Which method for calculating ROC is correct?
(Apologies for posting this as a new answer but I am not allowed to post a comment.)