How to binarize RandomForest to plot a ROC in pyth

2019-07-28 11:52发布

问题:

I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM

The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not.

So how can I binarize RandomForest and plot a ROC?

Thank you

EDIT

To create the fake data. So there are 20 features and 21 classes (3 samples for each class).

df = pd.DataFrame(np.random.rand(63, 20))
label = np.arange(len(df)) // 3 + 1 
df['label']=label
df


#TO TRAIN THE MODEL: IT IS A STRATIFIED SHUFFLED SPLIT
clf = make_pipeline(RandomForestClassifier())   
xSSSmean10 = []
for i in range(10):
    sss = StratifiedShuffleSplit(y, 10, test_size=0.1, random_state=i) 
    scoresSSS = cross_validation.cross_val_score(clf, x, y , cv=sss)

    xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("") 

回答1:

For multilabel random forest, each of your 21 labels has a binary classification, and you can create a ROC curve for each of the 21 classes. Your y_train should be a matrix of 0 and 1 for each label.

Assume you fit a multilabel random forest from sklearn and called it rf, and have a X_test and y_test after a test train split. You can plot the ROC curve in python for your first label using this:

from sklearn import metrics 
probs = rf.predict_proba(X_test)
fpr, tpr, threshs = metrics.roc_curve(y_test['name_of_your_first_tag'],probs[0][:,1])

Hope this helps. If you provide your code and data I could write this more specifically.