XGBoost. How to get probabilities of class from xg

I've got 3-class classification predict using XGBoost. Next turn is get tree-model (printed by xgb.dump()) and use it in .net production system. I really do not understand how can i get 3-dim value of probabilities from single value in leave:

<code>
[1107] "booster[148]""0:[f24<1.5] yes=1,no=2,missing=1"          
[1109] "1:[f4<0.085] yes=3,no=4,missing=3""3:leaf=0.00624765"                         
[1111] "4:leaf=-0.0208106""2:[f4<0.115] yes=5,no=6,missing=5"         
[1113] "5:leaf=0.14725""6:leaf=0.0102657"  
</code>

p.s. usinng python function from .Net is not good idea due to speed limitations

标签： r xgboost

1条回答

别忘想泡老子

2楼-- · 2019-03-02 03:24

This took a while to figure out. Once you get your tree, The steps to follow are

Figure out the leaf values for each booster. The first booster is class 0 next is class 1 next is class 2 next is class 0 and class 1 and so on. So essentially if you have 10 num_round, you will see 30 boosters.

Be careful about the "missing". If you have not specifically mentioned a missing value in the DMatrix, xgb can consider the value 0 as missing. So when you walk down your tree you might need to jump to the node x denoted by missing=x when you have the feature value as 0 for that node. One way of getting around this confusing thing is making sure you have put a missing value in the DMatrix when training and predicting. I put a value that is impossible to be present in my data and also made sure that I handle NA type of values by replacing them with some (non zero) value before I do train or predict. Obviously 0 can actually mean missing for you in which case that's OK. You might actually notice this thing in categorical features which have 1 or 0 in your data and the node in a tree has a ridiculous condition on a very small negative number etc.
Let's say you have 3 rounds. Then you will end up with values like this l1_0,l2_0,l3_0 for class 0 - and l1_1,l2_1,l3_1 for class 1 and l1_2,l2_2,l3_2 for class 2.

Now, a great way of making sure you are getting the right logic is to set output_margin and pred_leaf on. One at at time. When you set pred_leaf on, you will get a matrix which will show exactly which leaf you should have hit for all of your classes, for a single instance. When you set output_margin on you will get the sum of the leaf values which xgb is calculating.

Now, this sum is 0.5 + l1_0+l2_0+l3_0 for class 0 and so on. You can cross verify this with the predict response to get with output_margin on. Here 0.5 is the bias.
Now say you got v0, v1 and v2 as the bias + leaf value summation result. Then you probability for class 0 is
```
   p(class0) = exp(v0)/(exp(v0)+exp(v1)+exp(v2))
```

0人赞添加讨论(0) 举报

XGBoost. How to get probabilities of class from xg

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间