Tesseract OCR: is it possible to force a specific

2020-06-19 19:20发布

问题:

I'm using Tesseract and I want to develop an app that is able to recognize a sequence of characters. I had good results but not exellent.

The characters sequence I want to read has always a specific pattern, let's say:

number number number char char - (e.g.: 123AB)

Is there a way to "tell" the ocr engine that the structure is always fixed, in order to improve the results of the recognition?

Thank you in advance.

回答1:

Try bazaar matching pattern in Tesseract:

\d\d\d\c\c


回答2:

You can use the "tessedit_char_whitelist" parameter