We're actually working on an image analysis project where we need to identify the objects disappeared/appeared in a scene. Here are 2 images, one captured before an action has been made by the surgeon and the other afterwards.
First, we just calculated the difference between the 2 images and here is the result (Note that I added 128 to the result Mat
just to have a nicer image):
The goal is to detect that the cup (red arrow) has disappeared from the scene and the syringe (black arrow) has entered into the scene, by other words we should detect ONLY the regions that correspond to objects left/entered in the scene. Also, it's obvious that the objects in top left of the scene shifted a bit from their initial position. I thought about Optical flow
so I used OpenCV C++
to calculate the Farneback's one in order to see if it's enough for our case and here is the result we got, followed by the code we wrote:
void drawOptFlowMap(const Mat& flow, Mat& cflowmap, int step, double, const Scalar& color)
{
cout << flow.channels() << " / " << flow.rows << " / " << flow.cols << endl;
for(int y = 0; y < cflowmap.rows; y += step)
for(int x = 0; x < cflowmap.cols; x += step)
{
const Point2f& fxy = flow.at<Point2f>(y, x);
line(cflowmap, Point(x,y), Point(cvRound(x+fxy.x), cvRound(y+fxy.y)), color);
circle(cflowmap, Point(x,y), 1, color, -1);
}
}
void MainProcessorTrackingObjects::diffBetweenImagesToTestTrackObject(string pathOfImageCaptured, string pathOfImagesAfterOneAction, string pathOfResultsFolder)
{
//Preprocessing step...
string pathOfImageBefore = StringUtils::concat(pathOfImageCaptured, imageCapturedFileName);
string pathOfImageAfter = StringUtils::concat(pathOfImagesAfterOneAction, *it);
Mat imageBefore = imread(pathOfImageBefore);
Mat imageAfter = imread(pathOfImageAfter);
Mat imageResult = (imageAfter - imageBefore) + 128;
// absdiff(imageAfter, imageBefore, imageResult);
string imageResultPath = StringUtils::stringFormat("%s%s-color.png",pathOfResultsFolder.c_str(), fileNameWithoutFrameIndex.c_str());
imwrite(imageResultPath, imageResult);
Mat imageBeforeGray, imageAfterGray;
cvtColor( imageBefore, imageBeforeGray, CV_RGB2GRAY );
cvtColor( imageAfter, imageAfterGray, CV_RGB2GRAY );
Mat imageResultGray = (imageAfterGray - imageBeforeGray) + 128;
// absdiff(imageAfterGray, imageBeforeGray, imageResultGray);
string imageResultGrayPath = StringUtils::stringFormat("%s%s-gray.png",pathOfResultsFolder.c_str(), fileNameWithoutFrameIndex.c_str());
imwrite(imageResultGrayPath, imageResultGray);
//*** Compute FarneBack optical flow
Mat opticalFlow;
calcOpticalFlowFarneback(imageBeforeGray, imageAfterGray, opticalFlow, 0.5, 3, 15, 3, 5, 1.2, 0);
drawOptFlowMap(opticalFlow, imageBefore, 5, 1.5, Scalar(0, 255, 255));
string flowPath = StringUtils::stringFormat("%s%s-flow.png",pathOfResultsFolder.c_str(), fileNameWithoutFrameIndex.c_str());
imwrite(flowPath, imageBefore);
break;
}
And to know how much accurate this optical flow is, I wrote this small piece of code which calculates (IMAGEAFTER + FLOW) - IMAGEBEFORE:
//Reference method just to see the accuracy of the optical flow calculation
Mat accuracy = Mat::zeros(imageBeforeGray.rows, imageBeforeGray.cols, imageBeforeGray.type());
strinfor(int y = 0; y < imageAfter.rows; y ++)
for(int x = 0; x < imageAfter.cols; x ++)
{
Point2f& fxy = opticalFlow.at<Point2f>(y, x);
uchar intensityPointCalculated = imageAfterGray.at<uchar>(cvRound(y+fxy.y), cvRound(x+fxy.x));
uchar intensityPointBefore = imageBeforeGray.at<uchar>(y,x);
uchar intensityResult = ((intensityPointCalculated - intensityPointBefore) / 2) + 128;
accuracy.at<uchar>(y, x) = intensityResult;
}
validationPixelBased = StringUtils::stringFormat("%s%s-validationPixelBased.png",pathOfResultsFolder.c_str(), fileNameWithoutFrameIndex.c_str());
imwrite(validationPixelBased, accuracy);
The intent of having this ((intensityPointCalculated - intensityPointBefore) / 2) + 128;
is just to have a comprehensible image.
IMAGE RESULT:
Since it detects all the regions that have been shifted/entered/left the scene, we think the OpticalFlow
is not enough to detect just the regions representing the objects disappeared/appeared in the scene. Is there any way to ignore the sparse motions detected by opticalFlow
? Or is there any alternative way to detect what we need?
Let's say the goal here is to identify the regions with appeared/disappeared objects, but not the ones which are present in both pictures but just moved positions.
Optical flow should be a good way to go, as you have already done. However the issue is how the outcome is evaluated. As opposed to pixel-to-pixel diff which shows has no tolerance to rotation/scaling variances, you can do a feature matching (SIFT etc. Check out here for what you can use with opencv)
Here's what I get with Good Features To Track from your image before.
Instead of dense optical flow, you could use a sparse flow and track only the features,
Output includes FeaturesFound and Error values. I simply used a threshold here to distinguish moved features and unmatched disappeared ones.
The remaining incorrectly matched features can be filtered out. Here I used simple mean filtering plus thresholding to get the mask of the newly appeared region.
And then finding its convex hull to show the region in the original image (in yellow).
And simply do it in the reverse way(matching from imageAfter to imageBefore) to get the regions appeared. :)
You could try a two pronged approach - Using the image difference method is great at detecting objects which enter and exit the scene, so long as the colour of the object is different to the colour of the background. What strikes me is that it would be improved greatly if you could remove the objects that have moved before using the method.
There is a great OpenCV method for object detection here which finds points of interest in an image for detecting translation of an object. I think you could achieve what you want with the following method -
1 Compare images with the OpenCV code and highlight moving objects in both images
2 Colour in the detected objects with background the other picture at the same set of pixels (or something similar) to reduce the difference in images which is caused by moving images
3 Find the image difference which should now have large major objects and smaller artefacts left over from the moving images
4 Threshold for a certain size of object detected in image difference
5 Compile a list of likely candidates
There are other alternatives for object tracking, so there may be code you like more but the process should be okay for what you are doing, I think.
Here's what I tried;
The parameters may need tuning. I've used values that just worked for the two sample images. As feature detector/descriptor I've used SIFT (non-free). You can try other detectors and descriptors.
Diference Image:
Regions:
Changes (Red: insertion/removal, Yellow: sparse motion):