Analysis and transformation of the image on the ba

2019-05-21 09:57发布

问题:

I have an OCR project, but it works good only with images in which the text is fairly straight, not upside down. (not rotated text) So I want to make OCR to be able to recognize any kind of images, even upside down. But I don't know what are approaches to solve this problem.

I need something like analysis of lines of letters, but even then I can't identify if line is upside down or not.

回答1:

If the images you are performing OCR on are from a magazine or book where there is lots of text on multiple lines, I suggest trying to find the rotation of the page.

Probably the simplest way to do this is applying the hough transform for lines. Since the empty space between each line of text should be a a broad white line this could work without any preprocessing of the image. Otherwise try blurring it or using the "close" morphological operation to make the lines of text into opaque blocks.

Once you find the lines in the image with the hough transform you should just extract the principal angle of rotation (like the mean angle of all lines) and rotate it back.



回答2:

My answer to you will be very high level as this is not simple, as you can imagine. You probably are doing some sort of image segmentation, where you segment each character of your text. But in order to recognize the characters, even when they are rotated, you need to use a feature vector with rotational invariant characteristics. To do it some people are using

Zernike Moment

Neocognitron neural network - widely used for handwriting

I don't think it's a simple task



回答3:

Not sure if you are creating an OCR engine or using one. Most commercial OCR engines can detect that a page is upside-down (or 90 degree rotated) and auto-rotate it. For example, my company's GlyphReader OCR Engine can do that.

One simple solution is to take a portion of your image and run it through the engine at the four angles until you get back a good amount of recognized text. You can use a dictionary to see if what you are getting back is words and confidence levels to see how sure the engine is of its recognition.

If your engine can report confidence levels, and they are reporting consistently under some threshold, then you should stop and see if the document is rotated.

For 90 and 270, a hough transform will tell you whether the lines in the image are horizontal or vertical. It can also tell you if they are just slightly rotated off the horizontal so that you can correct that as well.