I am using the gbm function in R (gbm package) to fit stochastic gradient boosting models for multiclass classification. I am simply trying to obtain the importance of each predictor separately for each class, like in this picture from the Hastie book (the Elements of Statistical Learning) (p. 382).
However, the function summary.gbm
only returns the overall importance of the predictors (their importance averaged over all classes).
Does anyone know how to get the relative importance values?
I think the short answer is that on page 379, Hastie mentions that he uses MART, which appears to only be available for Splus.
I agree that the gbm package doesn't seem to allow for seeing the separate relative influence. If that's something you're interested in for a mutliclass problem, you could probably get something pretty similar by building a one-vs-all gbm for each of your classes and then getting the importance measures from each of those models.
So say your classes are a, b, c, & d. You model a vs. the rest and get the importance from that model. Then you model b vs. the rest and get the importance from that model. Etc.
Hopefully this function helps you. For the example I used data from the ElemStatLearn package. The function figures out what the classes for a column are, splits the data into these classes, runs the gbm() function on each class and plots the bar plots for these models.
`
I did some digging into how the gbm package calculates importance and it is based on the ErrorReduction which is contained in the trees element of the result and can be accessed with
pretty.gbm.trees()
. Relative influence is obtained by taking the sum of this ErrorReduction over all trees for each variable. For a multiclass problem there are actuallyn.trees*num.classes
trees in the model. So if there are 3 classes you can calculate the sum of the ErrorReduction for each variable over every third tree to get the importance for one class. I have written the following functions to implement this and then plot the results:Get Variable Importance By Class
Plot Variable Importance By Class
In my real use for this I have over 40 features so I give an option to specify the number of features to plot. I also couldn't use faceting if I wanted the plots to be sorted separately for each class, which is why I used
gridExtra
.Try It
Seems to give the same results as the built in
relative.influence
function if you sum the results over all the classes.