Is it possible to use xgboost for multilabel classification? Now I use OneVsRestClassifier over GradientBoostingClassifier from sklearn. It works, but use only one core from my CPU. In my data I have ~45 features and the task is to predict about 20 columns with binary(boolean) data. Metric is mean average precision (map@7). If you have a short example of code to share, that would be great.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
You can add a label to each class you want to predict. for example if this is your data:
You can simply reshape your data by adding a label to the input, according to the output, and xgboost should learn how to treat it accordingly, like so:
This way you will have a 1-dimensional Y, but you can still predict many labels.
There are a couple of ways to do that, one of which is the one you already suggested:
1.
clf_multilabel
will fit one binary classifier per class, and it will use however many cores you specify inparams
(fyi, you can also specifyn_jobs
inOneVsRestClassifier
, but that eats up more memory).2. If you first massage your data a little by making
k
copies of every data point that hask
correct labels, you can hack your way to a simpler multiclass problem. At that point, justto get classification margins/probabilities for each class and decide what threshold you want for predicting a label. Note that this solution is not exact: if a product has tags
(1, 2, 3)
, you artificially introduce two negative samples for each class.