How to get prediction p-values of an XGBClassifier

2019-08-01 22:38发布

I'd like to know how confident an XGBClassifier is for each prediction it makes. Is it possible to have such a value? Or is the predict_proba already indirectly the confidence of the model?

回答1:

Your intuition is indeed correct: predict_proba returns the probability of each example being of a given class; from the docs:

predict_proba(data, output_margin=False, ntree_limit=0)

Predict the probability of each data example being of a given class.

This probability in turn is routinely interpreted in practice as the confidence of the prediction.

That said, this is an ad-hoc, practical interpretation, and it has nothing to do with p-values or any other measure of statistical rigour; generally speaking and AFAIK, there are no such measures available for this (and similar) machine learning techniques.

On a more general level, you may be interested to know that p-values themselves have been quickly falling out of grace among statisticians; some quick links:

The ASA's Statement on p-Values: Context, Process, and Purpose (American Statistician)
Statisticians issue warning over misuse of P values (Nature)
The problems with p-values are not just with p-values (Andrew Gelman @ American Statistician)
The problem with p-values (Towards Data Science blog post)