In testing an object detection algorithm in large images, we check our detected bounding boxes against the coordinates given for the ground truth rectangles.
According to the Pascal VOC challenges, there's this:
A predicted bounding box is considered correct if it overlaps more
than 50% with a ground-truth bounding box, otherwise the bounding box
is considered a false positive detection. Multiple detections are
penalized. If a system predicts several bounding boxes that overlap
with a single ground-truth bounding box, only one prediction is
considered correct, the others are considered false positives.
This means that we need to calculate the percentage of overlap. Does this mean that the ground truth box is 50% covered by the detected boundary box? Or that 50% of the bounding box is absorbed by the ground truth box?
I've searched but I haven't found a standard algorithm for this - which is surprising because I would have thought that this is something pretty common in computer vision. (I'm new to it). Have I missed it? Does anyone know what the standard algorithm is for this type of problem?
For axis-aligned bounding boxes it is relatively simple:
def get_iou(bb1, bb2):
"""
Calculate the Intersection over Union (IoU) of two bounding boxes.
Parameters
----------
bb1 : dict
Keys: {'x1', 'x2', 'y1', 'y2'}
The (x1, y1) position is at the top left corner,
the (x2, y2) position is at the bottom right corner
bb2 : dict
Keys: {'x1', 'x2', 'y1', 'y2'}
The (x, y) position is at the top left corner,
the (x2, y2) position is at the bottom right corner
Returns
-------
float
in [0, 1]
"""
assert bb1['x1'] < bb1['x2']
assert bb1['y1'] < bb1['y2']
assert bb2['x1'] < bb2['x2']
assert bb2['y1'] < bb2['y2']
# determine the coordinates of the intersection rectangle
x_left = max(bb1['x1'], bb2['x1'])
y_top = max(bb1['y1'], bb2['y1'])
x_right = min(bb1['x2'], bb2['x2'])
y_bottom = min(bb1['y2'], bb2['y2'])
if x_right < x_left or y_bottom < y_top:
return 0.0
# The intersection of two axis-aligned bounding boxes is always an
# axis-aligned bounding box
intersection_area = (x_right - x_left) * (y_bottom - y_top)
# compute the area of both AABBs
bb1_area = (bb1['x2'] - bb1['x1']) * (bb1['y2'] - bb1['y1'])
bb2_area = (bb2['x2'] - bb2['x1']) * (bb2['y2'] - bb2['y1'])
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = intersection_area / float(bb1_area + bb2_area - intersection_area)
assert iou >= 0.0
assert iou <= 1.0
return iou
Explanation
Images are from this answer
I found that the conceptual answer is here:
http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2012/htmldoc/devkit_doc.html#SECTION00054000000000000000
from this thread:
Compare two bounding boxes with each other Matlab
I should be able to code this in python!
For the intersection distance, shouldn't we add a +1 so as to have
intersection_area = (x_right - x_left + 1) * (y_bottom - y_top + 1)
(same for the AABB)
Like on this pyimage search post
I agree (x_right - x_left) x (y_bottom - y_top) works in mathematics with point coordinates but since we deal with pixels it is I think different.
Consider a 1D example :
- 2 points : x1 = 1 and x2 = 3, the distance is indeed x2-x1 = 2
- 2 pixels of index : i1 = 1 and i2 = 3, the segment from pixel i1 to i2 contains 3 pixels ie l = i2 - i1 + 1
In the snippet below, I construct a polygon along the edges of the first box. I then use Matplotlib to clip the polygon to the second box. The resulting polygon contains four vertices, but we are only interested in the top left and bottom right corners, so I take the max and the min of the coordinates to get a bounding box, which is returned to the user.
import numpy as np
from matplotlib import path, transforms
def clip_boxes(box0, box1):
path_coords = np.array([[box0[0, 0], box0[0, 1]],
[box0[1, 0], box0[0, 1]],
[box0[1, 0], box0[1, 1]],
[box0[0, 0], box0[1, 1]]])
poly = path.Path(np.vstack((path_coords[:, 0],
path_coords[:, 1])).T, closed=True)
clip_rect = transforms.Bbox(box1)
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
return np.array([np.min(poly_clipped, axis=0),
np.max(poly_clipped, axis=0)])
box0 = np.array([[0, 0], [1, 1]])
box1 = np.array([[0, 0], [0.5, 0.5]])
print clip_boxes(box0, box1)
how about this approach? Could be extended to any number of unioned shapes
surface = np.zeros([1024,1024])
surface[1:1+10, 1:1+10] += 1
surface[100:100+500, 100:100+100] += 1
unionArea = (surface==2).sum()
print(unionArea)