I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM
The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not.
So how can I binarize RandomForest and plot a ROC?
Thank you
EDIT
To create the fake data. So there are 20 features and 21 classes (3 samples for each class).
df = pd.DataFrame(np.random.rand(63, 20))
label = np.arange(len(df)) // 3 + 1
df['label']=label
df
#TO TRAIN THE MODEL: IT IS A STRATIFIED SHUFFLED SPLIT
clf = make_pipeline(RandomForestClassifier())
xSSSmean10 = []
for i in range(10):
sss = StratifiedShuffleSplit(y, 10, test_size=0.1, random_state=i)
scoresSSS = cross_validation.cross_val_score(clf, x, y , cv=sss)
xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("")
For multilabel random forest, each of your 21 labels has a binary classification, and you can create a ROC curve for each of the 21 classes. Your y_train should be a matrix of 0 and 1 for each label.
Assume you fit a multilabel random forest from sklearn and called it rf, and have a X_test and y_test after a test train split. You can plot the ROC curve in python for your first label using this:
Hope this helps. If you provide your code and data I could write this more specifically.