Is it normal that tesseract does not recognize thi

2019-03-02 23:15发布

I need to extract words from small images like this:

I am using tesseract from the command line with spanish language option, like this:

tesseract category.png -l spa -psm 7 category.txt

I think that this text must be easy to parse by the OCR but the word is not recognized. I am using -l spa for spanish language and -psm 7 because the image has got only line (anyway if I don't use -psm parameter the result is the same).

This is the result: s…"…

I am using this build with the lang package: http://domasofan.spdns.eu/tesseract/ (official source cited in github)

标签： ocr tesseract

1条回答

啃猪蹄的小仙女

2楼-- · 2019-03-02 23:21

Tesseract seems to really struggle when scanning low resolution characters.

Try to scan this image. I enhanced its resolution by 400 percent (I think 200 percent is possible for scanning, but lets try 400%), did a great amount of blurring and did threshold of ~140 value. Try scanning this one, the results should be much better and I hope this satisfy you. If you need to do that programmatically, write in comments what is unclear for you, I will provide you some additional information.

0人赞添加讨论(0) 举报

Is it normal that tesseract does not recognize thi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间