Junk results when using Tesseract OCR and tess-two

2019-01-20 19:13发布

问题:

I have developed OCR Application using Tesseract OCR Library and referred from the following Links.

  1. android-ocr
  2. tesseract

But I am getting junk data as results sometimes. Can anyone help me what to do further to get accurate results.

回答1:

You should provide your test images if you want to get specific help for your case as well as any code you are using but a general rule of thumb for getting accurate results are :

  • Use a high resolution image (if needed) 300 DPI is minimum

  • Make sure there is no shadows or bends in the image

  • If there is any skew, you will need to fix the image in code prior to ocr

  • Use a dictionary to help get good results

  • Adjust the text size (12 pt font is ideal)

  • Binarize the image and use image processing algorithms to remove noise

On top of all this, there are a lot of image processing functions out there that can help increase accuracy depending on your image such as deskew, perspective correction, line removal, border removal, dot removal, despeckle, and many more depending on your image.