I need to extract words from small images like this:
I am using tesseract from the command line with spanish language option, like this:
tesseract category.png -l spa -psm 7 category.txt
I think that this text must be easy to parse by the OCR but the word is not recognized. I am using -l spa
for spanish language and -psm 7
because the image has got only line (anyway if I don't use -psm parameter the result is the same).
This is the result: s…"…
I am using this build with the lang package: http://domasofan.spdns.eu/tesseract/ (official source cited in github)
Tesseract seems to really struggle when scanning low resolution characters.
Try to scan this image. I enhanced its resolution by 400 percent (I think 200 percent is possible for scanning, but lets try 400%), did a great amount of blurring and did threshold of ~140 value. Try scanning this one, the results should be much better and I hope this satisfy you. If you need to do that programmatically, write in comments what is unclear for you, I will provide you some additional information.