Image Processing: Algorithm Improvement for 'C

2018-12-31 14:20发布

One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Template matching

Some constraints on the project:

  • The background could be very noisy.
  • The can could have any scale or rotation or even orientation (within reasonable limits).
  • The image could have some degree of fuzziness (contours might not be entirely straight).
  • There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!
  • The brightness of the image could vary a lot (so you can't rely "too much" on color detection).
  • The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.
  • There could be no can at all in the image, in which case you had to find nothing and write a message saying so.

So you could end up with tricky things like this (which in this case had my algorithm totally fail):

Total fail

I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:

  1. Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with. Binarized image
  2. Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
  3. Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. Contour detection

Algorithm: The algorithm itself I chose for this task was taken from this awesome book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:

  • You can describe an object in space without knowing its analytical equation (which is the case here).
  • It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
  • It uses a base model (a template) that the algorithm will "learn".
  • Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below:

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...

Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:

  • It is extremely slow! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
  • It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
  • Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
  • In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned?

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. :)

23条回答
何处买醉
2楼-- · 2018-12-31 15:04

An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

It is implemented in OpenCV 2.3.1.

You can find a nice code example using features in Features2D + Homography to find a known object

Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

Enter image description here

Image source: tutorial example

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.

The original papers

查看更多
若你有天会懂
3楼-- · 2018-12-31 15:05

As alternative to all these nice solutions, you can train your own classifier and make your application robust to errors. As example, you can use Haar Training, providing a good number of positive and negative images of your target.

It can be useful to extract only cans and can be combined with the detection of transparent objects.

查看更多
君临天下
4楼-- · 2018-12-31 15:08

Maybe too many years late, but nevertheless a theory to try.

The ratio of bounding rectangle of red logo region to the overall dimension of the bottle/can is different. In the case of Can, should be 1:1, whereas will be different in that of bottle (with or without cap). This should make it easy to distinguish between the two.

Update: The horizontal curvature of the logo region will be different between the Can and Bottle due their respective size difference. This could be specifically useful if your robot needs to pick up can/bottle, and you decide the grip accordingly.

查看更多
几人难应
5楼-- · 2018-12-31 15:09

Looking at shape

Take a gander at the shape of the red portion of the can/bottle. Notice how the can tapers off slightly at the very top whereas the bottle label is straight. You can distinguish between these two by comparing the width of the red portion across the length of it.

Looking at highlights

One way to distinguish between bottles and cans is the material. A bottle is made of plastic whereas a can is made of aluminum metal. In sufficiently well-lit situations, looking at the specularity would be one way of telling a bottle label from a can label.

As far as I can tell, that is how a human would tell the difference between the two types of labels. If the lighting conditions are poor, there is bound to be some uncertainty in distinguishing the two anyways. In that case, you would have to be able to detect the presence of the transparent/translucent bottle itself.

查看更多
只靠听说
6楼-- · 2018-12-31 15:10

Fun problem: when I glanced at your bottle image I thought it was a can too. But, as a human, what I did to tell the difference is that I then noticed it was also a bottle...

So, to tell cans and bottles apart, how about simply scanning for bottles first? If you find one, mask out the label before looking for cans.

Not too hard to implement if you're already doing cans. The real downside is it doubles your processing time. (But thinking ahead to real-world applications, you're going to end up wanting to do bottles anyway ;-)

查看更多
笑指拈花
7楼-- · 2018-12-31 15:13

Am a few years late in answering this question. With the state of the art pushed to its limits by CNNs in the last 5 years I wouldn't use OpenCV to do this task now! (I know you specifically wanted OpenCv features in the question) I feel object detection algorithms such as Faster-RCNNs, YOLO, SSD etc would ace this problem with a significant margin compared to OpenCV features. If I were to tackle this problem now (after 6 years !!) I would definitely use Faster-RCNN.

查看更多
登录 后发表回答