Prepare complex image for OCR

2019-02-04 19:00发布

问题:

I want to recognize digits from a credit card. To make things worse, the source image is not guaranteed to be of high quality. The OCR is to be realized through a neural network, but that shouldn't be the topic here.

The current issue is the image preprocessing. As credit cards can have backgrounds and other complex graphics, the text is not as clear as with scanning a document. I made experiments with edge detection (Canny Edge, Sobel), but it wasn't that successful. Also calculating the difference between the greyscale image and a blurred one (as stated at Remove background color in image processing for OCR) did not lead to an OCRable result.

I think most approaches fail because the contrast between a specific digit and its background is not strong enough. There is probably a need to do a segmentation of the image into blocks and find the best preprocessing solution for each block?

Do you have any suggestions how to convert the source to a readable binary image? Is edge detection the way to go or should I stick with basic color thresholding?

Here is a sample of a greyscale-thresholding approach (where I am obviously not happy with the results):

Original image:

Greyscale image:

Thresholded image:

Thanks for any advice, Valentin

回答1:

If it's at all possible, request that better lighting be used to capture the images. A low-angle light would illuminate the edges of the raised (or sunken) characters, thus greatly improving the image quality. If the image is meant to be analyzed by a machine, then the lighting should be optimized for machine readability.

That said, one algorithm you should look into is the Stroke Width Transform, which is used to extract characters from natural images.

Stroke Width Transform (SWT) implementation (Java, C#...)

A global threshold (for binarization or clipping edge strengths) probably won't cut it for this application, and instead you should look at localized thresholds. In your example images the "02" following the "31" is particularly weak, so searching for the strongest local edges in that region would be better than filtering all edges in the character string using a single threshold.

If you can identify partial segments of characters, then you might use some directional morphology operations to help join segments. For example, if you have two nearly horizontal segments like the following, where 0 is the background and 1 is the foreground...

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0
0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0

then you could perform a morphological "close" operation along the horizontal direction only to join those segments. The kernel could be something like

x x x x x
1 1 1 1 1
x x x x x

There are more sophisticated methods to perform curve completion using Bezier fits or even Euler spirals (a.k.a. clothoids), but preprocessing to identify segments to be joined and postprocessing to eliminate poor joins can get very tricky.



回答2:

The way how I would go about the problem is separate the cards into different section. There are not many unique credit cards to begin with (MasterCard, Visa, the list is up to you), so you can make like a drop down to specify which credit card it is. That way, you can eliminate and specify the pixel area:

Example:

Only work with the area 20 pixels from the bottom, 30 pixels from the left to the 10 pixels from right to 30 pixels from bottom (creating a rectangle) - This would cover all MasterCards

When I worked with image processing programs (fun project) I turned up the contrast of the picture, converted it to grey scale, took the average of each individual RGB values of 1 pixel, and compared it to the all around pixels:

Example:

PixAvg[i,j] = (Pix.R + Pix.G + Pix.B)/3
if ((PixAvg[i,j] - PixAvg[i,j+1])>30)
    boolEdge == true;

30 would be how distinct you want your image to be. The lower the difference, the lower is going to be the tolerance.

In my project, to view edge detection, I made a separate array of booleans, which contained values from boolEdge, and a pixel array. The pixel array was filled with only black and white dots. It got the values from the boolean array, where boolEdge = true is a white dot, and boolEdge = false is a black dot. So in the end, you end up with a pixel array (full picture) that just contains white and black dots.

From there, it is much easier to detect where a number starts and where a number finishes.



回答3:

in my implementation i tried to use the code from here:http://rnd.azoft.com/algorithm-identifying-barely-legible-embossed-text-image/ results are better but not enough... i find it hard to find the right params for texture cards.

(void)processingByStrokesMethod:(cv::Mat)src dst:(cv::Mat*)dst { 
cv::Mat tmp;  
cv::GaussianBlur(src, tmp, cv::Size(3,3), 2.0);                    // gaussian blur  
tmp = cv::abs(src - tmp);                                          // matrix of differences between source image and blur iamge  

//Binarization:  
cv::threshold(tmp, tmp, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);  

//Using method of strokes:  
int Wout = 12;  
int Win = Wout/2;  
int startXY = Win;  
int endY = src.rows - Win;  
int endX = src.cols - Win;  

for (int j = startXY; j < endY; j++) {  
    for (int i = startXY; i < endX; i++) {  
        //Only edge pixels:  
        if (tmp.at<unsigned char="">(j,i) == 255)  
        {  
            //Calculating maxP and minP within Win-region:  
            unsigned char minP = src.at<unsigned char="">(j,i);  
            unsigned char maxP = src.at<unsigned char="">(j,i);  
            int offsetInWin = Win/2;  

            for (int m = - offsetInWin; m < offsetInWin; m++) {  
                for (int n = - offsetInWin; n < offsetInWin; n++) {  
                    if (src.at<unsigned char="">(j+m,i+n) < minP) {  
                        minP = src.at<unsigned char="">(j+m,i+n);  
                    }else if (src.at<unsigned char="">(j+m,i+n) > maxP) {  
                        maxP = src.at<unsigned char="">(j+m,i+n);  
                    }  
                }  
            }  

            //Voiting:  
            unsigned char meanP = lroundf((minP+maxP)/2.0);  

            for (int l = -Win; l < Win; l++) {  
                for (int k = -Win; k < Win; k++) {  
                    if (src.at<unsigned char="">(j+l,i+k) >= meanP) {  
                        dst->at<unsigned char="">(j+l,i+k)++;  
                    }  
                }  
            }  
        }  
    }  
}  

///// Normalization of imageOut:  
unsigned char maxValue = dst->at<unsigned char="">(0,0);  

for (int j = 0; j < dst->rows; j++) {              //finding max value of imageOut  
    for (int i = 0; i < dst->cols; i++) {  
        if (dst->at<unsigned char="">(j,i) > maxValue)  
            maxValue = dst->at<unsigned char="">(j,i);  
    }  
}  
float knorm = 255.0 / maxValue;  

for (int j = 0; j < dst->rows; j++) {             //normalization of imageOut  
    for (int i = 0; i < dst->cols; i++) {  
        dst->at<unsigned char="">(j,i) = lroundf(dst->at<unsigned char="">(j,i)*knorm);  
    }  
}