Xg boost for multilabel classification?

Is it possible to use xgboost for multilabel classification? Now I use OneVsRestClassifier over GradientBoostingClassifier from sklearn. It works, but use only one core from my CPU. In my data I have ~45 features and the task is to predict about 20 columns with binary(boolean) data. Metric is mean average precision (map@7). If you have a short example of code to share, that would be great.

标签： python xgboost multilabel-classification

2条回答

叼着烟拽天下

2楼-- · 2019-04-04 14:09

You can add a label to each class you want to predict. for example if this is your data:

X1 X2 X3 X4  Y1 Y2 Y3
 1  3  4  6   7  8  9
 2  5  5  5   5  3  2

You can simply reshape your data by adding a label to the input, according to the output, and xgboost should learn how to treat it accordingly, like so:

X1 X2 X3 X3 X_label Y
 1  3  4  6   1     7
 1  3  4  6   1     5
 1  3  4  6   2     8
 2  5  5  5   2     3
 2  5  5  5   3     9
 2  5  5  5   3     2

This way you will have a 1-dimensional Y, but you can still predict many labels.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-04-04 14:17

There are a couple of ways to do that, one of which is the one you already suggested:

from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
# If you want to avoid the OneVsRestClassifier magic switch
# from sklearn.multioutput import MultiOutputClassifier

clf_multilabel = OneVsRestClassifier(XGBClassifier(**params))

clf_multilabel will fit one binary classifier per class, and it will use however many cores you specify in params (fyi, you can also specify n_jobs in OneVsRestClassifier, but that eats up more memory).

2. If you first massage your data a little by making k copies of every data point that has k correct labels, you can hack your way to a simpler multiclass problem. At that point, just

clf = XGBClassifier(**params)
clf.fit(train_data)
pred_proba = clf.predict_proba(test_data)

to get classification margins/probabilities for each class and decide what threshold you want for predicting a label. Note that this solution is not exact: if a product has tags (1, 2, 3), you artificially introduce two negative samples for each class.

0人赞添加讨论(0) 举报

Xg boost for multilabel classification?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间