I'm trying to automate a process where someone manually converts a code to a digital one.
Then I started reading about OCR. So I installed tesseract OCR and tried it on some images. It doesn't even detect something close to the code.
I figured after reading some questions on stackoverflow, that the images need some preprocessing like skewing the image to a horizontal one, which can been done by openCV for example.
Now my questions are:
- What kind of preprocessing or other methods should be used in a case like the above image?
- Secondly, can I rely on the output? Will it always work in cases like the above image?
I hope someone can help me!
I have decided to capture the whole card instead of the code only. By capturing the whole card it is possible to transform it to a plain perspective and then I could easily get the "code" region.
Also I learned a lot of things. Especially regarding speed. This function is slow on high resolution images. It can take up to 10 seconds with a size of 3264 x 1836.
What I did to speed things up, is re-sizing the input matrix by a factor of
1 / 4
. Which makes it4^2
times faster and gave me a minimal lose of precision. The next step is scaling the quadrangle which we found back to the normal size. So that we can transform the quadrangle to a plain perspective using the original source.The code I created for detecting the largest area is heavily based on code I found on stackoverflow. Unfortunately they didn't work as expected for me, so I combined more code snippets and modified a lot. This is what I got:
You can call it like so:
See this answer for quadrangle detection from a contour and transforming it to a plain perspective: Java OpenCV deskewing a contour