scikit weighted f1 score calculation and usage

I have a question regarding weighted average in sklearn.metrics.f1_score

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

Second, I heard that weighted-F1 is deprecated, is it true?

Third, how actually weighted-F1 is being calculated, for example

{
    "0": {
        "TP": 2,
        "FP": 1,
        "FN": 0,
        "F1": 0.8
    },
    "1": {
        "TP": 0,
        "FP": 2,
        "FN": 2,
        "F1": -1
    },
    "2": {
        "TP": 1,
        "FP": 1,
        "FN": 2,
        "F1": 0.4
    }
}

How to calculate weighted-F1 of the above example. I though it should be something like (0.8*2/3 + 0.4*1/3)/3, however I was wrong.

标签： machine-learning nlp scikit-learn precision-recall

1条回答

萌系小妹纸

2楼-- · 2019-07-11 06:45

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

I don't have any references, but if you're interested in multi-label classification where you care about precision/recall of all classes, then the weighted f1-score is appropriate. If you have binary classification where you just care about the positive samples, then it is probably not appropriate.

Second, I heard that weighted-F1 is deprecated, is it true?

No, weighted-F1 itself is not being deprecated. Only some aspects of the function interface were deprecated, back in v0.16, and then only to make it more explicit in previously ambiguous situations. (Historical discussion on github or check out the source code and search the page for "deprecated" to find details.)

Third, how actually weighted-F1 is being calculated?

From the documentation of f1_score:

``'weighted'``:
  Calculate metrics for each label, and find their average, weighted
  by support (the number of true instances for each label). This
  alters 'macro' to account for label imbalance; it can result in an
  F-score that is not between precision and recall.

So the average is weighted by the support, which is the number of samples with a given label. Because your example data above does not include the support, it is impossible to compute the weighted f1 score from the information you listed.

0人赞添加讨论(0) 举报

scikit weighted f1 score calculation and usage

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间