Train Tesseract for specific words - possible?

2019-07-15 04:56发布

I want to use Tesseract to extract about 10-20 keywords from a document. The document will contain all English characters/words. What I am interested in is something like "Age: 23". Here Age is the keyword I am interested in and want to extract the 23 (the value for that) as well.

The first approach that comes in my mind is to extract the whole page into text and then look for keywords in the recognized text. But in terms of training the tesseract, is there a better approach if I know the keywords, which might result in a better accuracy?

I am more or less aware of the limitations of Tesseract OCR. Trying to maximize within that limitations. Thanks for all your expert advice.

标签: ocr tesseract
1条回答
祖国的老花朵
2楼-- · 2019-07-15 05:24

Try bazaar matching pattern in Tesseract.

查看更多
登录 后发表回答