Accuracy, precision, recall and f-score are measures of a system quality in machine-learning systems. It depends on a confusion matrix of True/False Positives/Negatives.
Given a binary classification task, I have tried the following to get a function that returns accuracy, precision, recall and f-score:
gold = [1] + [0] * 9
predicted = [1] * 10
def evaluation(gold, predicted):
true_pos = sum(1 for p,g in zip(predicted, gold) if p==1 and g==1)
true_neg = sum(1 for p,g in zip(predicted, gold) if p==0 and g==0)
false_pos = sum(1 for p,g in zip(predicted, gold) if p==1 and g==0)
false_neg = sum(1 for p,g in zip(predicted, gold) if p==0 and g==1)
try:
recall = true_pos / float(true_pos + false_neg)
except:
recall = 0
try:
precision = true_pos / float(true_pos + false_pos)
except:
precision = 0
try:
fscore = 2*precision*recall / (precision + recall)
except:
fscore = 0
try:
accuracy = (true_pos + true_neg) / float(len(gold))
except:
accuracy = 0
return accuracy, precision, recall, fscore
But it seems like I have redundantly looped through the dataset 4 times to get the True/False Positives/Negatives.
Also the multiple try-excepts
to catch the ZeroDivisionError
is a little redundant.
So what is the pythonic way to get the counts of the True/False Positives/Negatives without multiple loops through the dataset?
How do I pythonically catch the ZeroDivisionError
without the multiple try-excepts?
I could also do the following to count the True/False Positives/Negatives in one loop but is there an alternative way without the multiple if
?:
for p,g in zip(predicted, gold):
if p==1 and g==1:
true_pos+=1
if p==0 and g==0:
true_neg+=1
if p==1 and g==0:
false_pos+=1
if p==0 and g==1:
false_neg+=1
I would use a
collections.Counter
, roughly what you're doing with all of theif
s (you should be usingelif
s, as your conditions are mutually exclusive) at the end:Then e.g.
true_pos = counts[1, 1]
.For a start, you should (almost) never use a bare
except:
. If you're catchingZeroDivisionError
s, then writeexcept ZeroDivisionError
. You could also consider a "look before you leap" approach, checking whether the denominator is0
before trying the division, e.g.Depending on your needs, there are several libraries that will calculate precision, recall, F-score, etc. One that I have used is
scikit-learn
. Assuming that you have alignedlist
s of actual and predicted values, then it is as simple as...One of the advantages of using this library is that different flavors of metrics (such as micro-averaging, macro-averaging, weighted, binary, etc.) come free out of the box.
This is a pretty natural use case for the bitarray package.
There's some type conversion overhead, but after that, the bitwise operations are much faster.
For 100 instances, timeit on my PC gives 0.036 for your method and 0.017 using bitarray at 1000 passes. For 1000 instances, it goes to 0.291 and 0.093. For 10000, 3.177 and 0.863. You get the idea.
It scales pretty well, using no loops, and doesn't have to store a large intermediate representation building a temporary list of tuples in
zip
.