I started using the cocoapi to evaluate a model trained using the Object Detection API.
After reading various sources that explain mean average precision (mAP) and recall, I am confused with the "maximum detections" paramter used in the cocoapi.
From what I understood (e.g. here, here or here), one calculates mAP by calculating precision and recall for various model score thresholds. This gives the precision-recall curve and mAP is calculated as an approximation to the area under this curve. Or, expressed differently, as the average of the maximum precision in defined recall ranges (0:0.1:1).
However, the cocoapi seems to calculate precision and recall for a given number of maximum detections (maxDet
) with the highest scores. And from there get the precision-recall curve for maxDets = 1, 10, 100
. Why is this a good metric since it is clearly not the same as the above method (it potentially excludes datapoints)?
In my example, I have ~ 3000 objects per image. Evaluating the result using the cocoapi gives terrible recall because it limits the number of detected objects to 100.
For testing purposes, I feed the evaluation dataset as the ground truth and the detected objects (with some artificial scores). I would expect precision and recall pretty good, which is actually happening. But as soon as I feed in more than 100 objects, precision and recall go down with increasing number of "detected objects". Even though they are all "correct"! How does that make sense?
I came to the conclusion, that this is just the way that the cocoapi defines its metric. It probably makes sense in their context, but I can as well define my own (which is what I did), based on the articles I read and linked above.
You can change the maxDets
parameter and define a new summarize()
instance method.
Let's create a COCOeval
object:
cocoEval = COCOeval(cocoGt,cocoDt,annType)
cocoEval.params.maxDets = [200]
cocoEval.params.imgIds = imgIdsDt
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize_2() # instead of calling cocoEval.summarize()
Now, define summarize_2()
method in cocoeval.py
module in the following way:
def summarize_2(self):
# Copy everything from `summarize` method here except
# the function `_summarizeDets()`.
def _summarizeDets():
stats = np.zeros((12,))
stats[0] = _summarize(1, maxDets=self.params.maxDets[0])
stats[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[0])
stats[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[0])
stats[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[0])
stats[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[0])
stats[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[0])
stats[6] = _summarize(0, maxDets=self.params.maxDets[0])
stats[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[0])
stats[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[0])
stats[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[0])
return stats
# Copy other things which are left from `summarize()` here.
If you run the above method over your dataset, you will get an output similar to this:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=200 ] = 0.507
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=200 ] = 0.699
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=200 ] = 0.575
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=200 ] = 0.586
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=200 ] = 0.519
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=200 ] = 0.501
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=200 ] = 0.598
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=200 ] = 0.640
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=200 ] = 0.566
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=200 ] = 0.564