When a typical mobile phone user takes picture for a card-size object, some background texture is usually included in the image -- please refer to the attached samples. In certain cases, that background could pollute OCR's accuracy.
I am wondering that whether there are solutions or not to remove the background (am positive that there are), or detect the background regions so one can just crop them off before OCR. In case of the attached images, wood tables and counter-top presenting are the candidate being removed. I would imagine that contrasting colors could be a solution but not so sure.
There are certain cases where you, as a human, have trouble discerning between background and foreground, so certainly there is no method to do correctly what you want. Since you mention OCR, I assume you actually want to eliminate everything that is not text. This doesn't make the question any easier actually, so what I'm actually assuming is that you want to keep objects that are highly contrasted against other objects (like foreground and background, or black text on a white background, for example). Again, there is no perfect method for that.
So, all this answer is going to do is present a simple method that might help you in your task. The method is a combination of ready morphological tools and the Otsu method for binarization since it is statistically optimal. The result are the regions that are potentially worth to look at. Note that you will certainly need to combine these results with many other different analysis, a good OCR system goes much beyond these direct approaches.
The method: 1) convert the image to grayscale (not interested in the colors, but a different method can certainly use them); 2) Use the h-dome transform to remove irrelevant maxima; 3) Calculate the morphological gradient; 4) Binarize by otsu; 5) Remove small objects by area opening. Removing irrelevant maxima is important for your task since you can have pretty horrible regions caused by a combination of bad camera's with bad camera's flash together with a inexperienced photographer. H-dome transform is based on morphological reconstruction, so if your library has the latter but not the former, it is straightforward to implement it (otherwise you could learn how to efficiently implement the latter). Morphological gradient for discrete images is a very simple method to apply which tends to work fine even with bad illumination, since it is a local method. Threshold on its result by Otsu keeps the strongest edges (which possibly includes noise and other minor features). You could precede all this by a gaussian smoothing, which might serve as an initial tool for noise suppression. The small features are readily removed by area opening. In Matlab, this can be done as in:
assuming that objects smaller than 50 pixels are irrelevant (which might not always be the case for small text).
Here are the
w
images for your examples:These outputs give an indication of where you should look for text, i.e., the interior of the connected components.