I'm trying to perform object detection with RCNN on my own dataset following the tutorial on Matlab webpage. Based on the picture below:
I'm supposed to put image paths in the first column and the bounding box of each object in the following columns. But in each of my images, there is more than one object of each kind. For example there are 20 vehicles in one image. How should I deal with that? Should I create a separate row for each instance of vehicle in an image?
The example found on the website finds the pixel neighbourhood with the largest score and draws a bounding box around that region in the image. When you have multiple objects now, that complicates things. There are two approaches that you can use to facilitate finding multiple objects.
An alternative approach would be to choose some value
k
and you would display the topk
bounding boxes associated with thek
highest scores. This of course requires that you know what the value ofk
is before hand and it will always assume that you have found an object in the image like the second approach.In addition to the above logic, the approach that you state where you need to create a separate row for each instance of vehicle in the image is correct. This means that if you have multiple candidates of an object in a single image, you would need to introduce one row per instance while keeping the image filename the same. Therefore, if you had for example 20 vehicles in one image, you would need to create 20 rows in your table where the filename is all the same and you would have a single bounding box specification for each distinct object in that image.
Once you have done this, assuming that you have already trained the R-CNN detector and you want to use it, the original code to detect objects is the following referencing the website:
This only works for one object which has the highest score. If you wanted to do this for multiple objects, you would use the
score
that is output from thedetect
method and find those locations that either accommodate situation 1 or situation 2.If you had situation 1, you would modify it to look like the following.
Note that I've stored the original bounding boxes, labels and scores in their original variables while the subset of the ones that surpassed the threshold in separate variables in case you want to cross-reference between the two. If you wanted to accommodate for situation 2, the code remains the same as situation 1 with the exception of defining the threshold.
The code from:
... would now change to:
The end result will be multiple bounding boxes of the detected objects in the image - one annotation per detected object.
I think you actually have to put all of the coordinates for that image as a single entry in your training data table. See this MATLAB tutorial for details. If you load the training data to your MATLAB locally and check the
vehicleDataset
variable, you will actually see this (sorry my score is not high enough to include images directly in my answers).To summarize, in your training data table, make sure you have one unique entry for each image, and put however many bounding boxes into the corresponding category as a matrix, where each row is in the format of
[x, y, width, height]
.