How do I improve the accuracy of the OCR text from

2019-02-11 00:38发布

I created a basic app for recognizing text using the Tesseract API from Google and integrated it with my camera app. It works fine but the only problem is the accuracy, as sometimes the text is recognized as a random set of characters and I guess the accuracy is about 50 percent.

Further, when it tries to scan more than four words in an image, the app crashes.

String ocrText = baseApi.getUTF8Text();
baseApi.end();

where baseApi is the object of the Tesseract API class.

Do I need to use a different data structure to save the recognized text or is there some other reason why more than four words don't get recognized?

标签： java android android-ndk ocr tesseract

1条回答

Lonely孤独者°

2楼-- · 2019-02-11 01:18

Tesseract API class provides a isValidWord Method to check if the string is a valid word. You can use this to check the recognized characters. This will increase the accuracy of the output.

I am developing using Tess4j Which is a Java JNA wrapper for tesseract-ocr, and it gives quite good results after checking.

Inaccurate results might be due to the text size, check this out. It says "Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi."

Further, not being able to detect more than 4 words depends on a lot of factors, what kind (with how many features) of test image, the size of the image, platform etc.

0人赞添加讨论(0) 举报

How do I improve the accuracy of the OCR text from

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间