How to binarize RandomForest to plot a ROC in pyth

2019-07-28 11:19发布

I have 21 classes. I am using RandomForest. I want to plot a ROC curve, so I checked the example in scikit ROC with SVM

The example uses SVM. SVM has parameters like: probability and decision_function_shape which RF does not.

So how can I binarize RandomForest and plot a ROC?

Thank you

EDIT

To create the fake data. So there are 20 features and 21 classes (3 samples for each class).

df = pd.DataFrame(np.random.rand(63, 20))
label = np.arange(len(df)) // 3 + 1 
df['label']=label
df


#TO TRAIN THE MODEL: IT IS A STRATIFIED SHUFFLED SPLIT
clf = make_pipeline(RandomForestClassifier())   
xSSSmean10 = []
for i in range(10):
    sss = StratifiedShuffleSplit(y, 10, test_size=0.1, random_state=i) 
    scoresSSS = cross_validation.cross_val_score(clf, x, y , cv=sss)

    xSSSmean10.append(scoresSSS.mean())
result_list.append(xSSSmean10)
print("") 

1条回答
聊天终结者
2楼-- · 2019-07-28 11:49

For multilabel random forest, each of your 21 labels has a binary classification, and you can create a ROC curve for each of the 21 classes. Your y_train should be a matrix of 0 and 1 for each label.

Assume you fit a multilabel random forest from sklearn and called it rf, and have a X_test and y_test after a test train split. You can plot the ROC curve in python for your first label using this:

from sklearn import metrics 
probs = rf.predict_proba(X_test)
fpr, tpr, threshs = metrics.roc_curve(y_test['name_of_your_first_tag'],probs[0][:,1])

Hope this helps. If you provide your code and data I could write this more specifically.

查看更多
登录 后发表回答