scikit-learn Decision trees Regression: retrieve a

2019-07-29 10:13发布

问题:

I have started using scikit-learn Decision Trees and so far it is working out quite well but one thing I need to do is retrieve the set of sample Y values for the leaf node, especially when running a prediction. That is given an input feature vector X, I want to know the set of corresponding Y values at the leaf node instead of just the regression value which is the mean (or median) of those values. Of course one would want the sample mean to have a small variance but I do want to extract the actual set of Y values and do some statistics/create a PDF. I have used code like this how to extract the decision rules from scikit-learn decision-tree? To print the decision tree but the output of the 'value' is the single float representing the mean. I have a large dataset so limit the leaf size to e.g. 100, I want to access those 100 values...

回答1:

another solution is to use an (undocumented?) feature of the sklearn DecisionTreeRegressor object which is .tree.impurity it returns the standard deviation of the values per each leaf