I am using tesseract open source engine for OCR to read text from image. But I didn't get 100% result for a single time. Please give your suggestions about quality improvement for OCR using tesseract. Thanks
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
here is how to get best result from tesseract Please make sure that you have done preprocessing on image. OVR will produce best results for the images which have following properties:
- fix DPI (if needed) 300 DPI is minimum
- fix text size (e.g. 12 pt should be ok)
- try to fix text lines (deskew and dewarp text)
- try to fix illumination of image (e.g. no dark part of image
- binarize and de-noise image
https://groups.google.com/forum/?fromgroups=#!topic/tesseract-ocr/g5aE_OvgyTU