How to Calculate F1 measure in multi-label classif

2019-07-21 06:58发布

问题:

I am working on sentence category detection Problem. Where each sentence can belong to multiple categories for Example:

"It has great sushi and even better service."
True Label:  [[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.]]
Pred Label:  [[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.]]
Correct Prediction!
Output:  ['FOOD#QUALITY' 'SERVICE#GENERAL'] 

I have implemented a classifier that can predict multiple categories. I have total 587 sentences that belongs to multiple categories. I have calculated the accuracy scores in two ways:

If all labels of an example predicted or not?

code:

print "<------------ZERO one ERROR------------>" 
print "Total Examples:",(truePred+falsePred) ,"True Pred:",truePred, "False Pred:", falsePred, "Accuracy:", truePred/(truePred+falsePred)

Output:

<------------ZERO one ERROR------------>
Total Examples: 587 True Pred: 353 False Pred: 234 Accuracy: 0.60136286201

How many labels are correctly predicted for all examples?

code:

print "\n<------------Correct and inccorrect predictions------------>"
print "Total Labels:",len(total[0]),"Predicted Labels:", corrPred, "Accuracy:", corrPred/len(total[0])

Output:

<------------Correct and inccorrect predictions------------> 
Total Labels: 743 Predicted Labels: 522 Accuracy: 0.702557200538

Problem: These are all the accuracy scores calculated by comparing predicted scores with ground truth labels. But i want to calculate F1 score (using micro averaging), precision and recall as well. I have ground truth labels and i need to match my predictions with those ground truth labels. But, i don't know how do i tackle such type of multi-label classification problem. Can i use scikit-learn or any other libraries in python?

回答1:

Have a look at the metrics already available with sklearn and understand them. They are not available for multiclass multilabel classification so you can write your own or map your categories to labels.

[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  1.] => 0
[ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.  0.  0.] => 1
...

You have to understand what this solution implies : if an example have 4 classes and if you have 3 out of the 4 correctly predicted, using an accuracy_score will be the same as a prediction of 0 out of 4 correctly predicted.

It is an error.

Here an example

>>> from sklearn.metrics import accuracy_score
>>> y_pred = [0, 2, 1, 3]
>>> y_true = [0, 1, 2, 3]
>>> accuracy_score(y_true, y_pred)
0.5


回答2:

I made matrix of predicted labels predictedlabel and i already had correct categories to compare my results in y_test. So, i tried the following code:

from sklearn.metrics import classification_report
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score

print "Classification report: \n", (classification_report(y_test, predictedlabel))
print "F1 micro averaging:",(f1_score(y_test, predictedlabel, average='micro'))
print "ROC: ",(roc_auc_score(y_test, predictedlabel))

and i got the following results:

        precision    recall  f1-score   support

      0       0.74      0.93      0.82        57
      1       0.00      0.00      0.00         3
      2       0.57      0.38      0.46        21
      3       0.75      0.75      0.75        12
      4       0.44      0.68      0.54        22
      5       0.81      0.93      0.87       226
      6       0.57      0.54      0.55        48
      7       0.71      0.38      0.50        13
      8       0.70      0.72      0.71       142
      9       0.33      0.33      0.33        33
     10       0.42      0.52      0.47        21
     11       0.80      0.91      0.85       145

     av/total 0.71      0.78      0.74       743

 F1 micro averaging: 0.746153846154
 ROC:  0.77407943841

So, i am calculating my results in this way!