I am working on sentence category detection Problem. Where each sentence can belong to multiple categories for Example:
"It has great sushi and even better service."
True Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]]
Pred Label: [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1.]]
Correct Prediction!
Output: ['FOOD#QUALITY' 'SERVICE#GENERAL']
I have implemented a classifier that can predict multiple categories. I have total 587 sentences that belongs to multiple categories. I have calculated the accuracy scores in two ways:
If all labels of an example predicted or not?
code:
print "<------------ZERO one ERROR------------>"
print "Total Examples:",(truePred+falsePred) ,"True Pred:",truePred, "False Pred:", falsePred, "Accuracy:", truePred/(truePred+falsePred)
Output:
<------------ZERO one ERROR------------>
Total Examples: 587 True Pred: 353 False Pred: 234 Accuracy: 0.60136286201
How many labels are correctly predicted for all examples?
code:
print "\n<------------Correct and inccorrect predictions------------>"
print "Total Labels:",len(total[0]),"Predicted Labels:", corrPred, "Accuracy:", corrPred/len(total[0])
Output:
<------------Correct and inccorrect predictions------------>
Total Labels: 743 Predicted Labels: 522 Accuracy: 0.702557200538
Problem: These are all the accuracy scores calculated by comparing predicted scores with ground truth labels. But i want to calculate F1 score (using micro averaging), precision and recall as well. I have ground truth labels and i need to match my predictions with those ground truth labels. But, i don't know how do i tackle such type of multi-label classification problem. Can i use scikit-learn or any other libraries in python?