I am trying to explain the decision taken by h2o GBM model. based on idea:https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 I want to calculate the contribution by each feature into making a certain decision at test time. Is it possible to get each individual tree from the ensable along with the log-odds at every node? also be needing the path traverse for each tree by model while making the prediction.
问题:
回答1:
H2O doesn't have an equivalent xgboostExplainer package. However, there is a way to get something close.
1) if you want to know what decision path was taken for a single row/observation you can use h2o.predict_leaf_node_assignment(model, frame)
to get an H2OFrame with the leaf node assignments which will generate something that looks like the following (showing the path for each tree built in the following case you can see that 5 trees were built):
2) you can visualize individual trees using H2O's MOJO which you can download once you've built your GBM or XGBoost model, which will look something like the following:
3) in an upcoming release you will be able to get the prediction value for each leaf node using the GBM (the pull request for this is here)
Putting all these steps together should get you pretty close to getting the values you want so you can add them up for your individual feature impact.(For a python jupyter notebook with examples on how to generate the leaf node assignments and visualize a tree look here)