I started using the cocoapi to evaluate a model trained using the Object Detection API. After reading various sources that explain mean average precision (mAP) and recall, I am confused with the "maximum detections" paramter used in the cocoapi.
From what I understood (e.g. here, here or here), one calculates mAP by calculating precision and recall for various model score thresholds. This gives the precision-recall curve and mAP is calculated as an approximation to the area under this curve. Or, expressed differently, as the average of the maximum precision in defined recall ranges (0:0.1:1).
However, the cocoapi seems to calculate precision and recall for a given number of maximum detections (maxDet
) with the highest scores. And from there get the precision-recall curve for maxDets = 1, 10, 100
. Why is this a good metric since it is clearly not the same as the above method (it potentially excludes datapoints)?
In my example, I have ~ 3000 objects per image. Evaluating the result using the cocoapi gives terrible recall because it limits the number of detected objects to 100.
For testing purposes, I feed the evaluation dataset as the ground truth and the detected objects (with some artificial scores). I would expect precision and recall pretty good, which is actually happening. But as soon as I feed in more than 100 objects, precision and recall go down with increasing number of "detected objects". Even though they are all "correct"! How does that make sense?
You can change the
maxDets
parameter and define a newsummarize()
instance method.Let's create a
COCOeval
object:Now, define
summarize_2()
method incocoeval.py
module in the following way:If you run the above method over your dataset, you will get an output similar to this:
I came to the conclusion, that this is just the way that the cocoapi defines its metric. It probably makes sense in their context, but I can as well define my own (which is what I did), based on the articles I read and linked above.