Android OCR detecting digits only using popular te

I'm using the popular OCR tessercat fork for android tess-two https://github.com/rmtheis/tess-two. I integrated all the staff and it works etc...

But I need to detect only digits, my code for now is:

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(pathToLngFile, langName);
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
doSomething(recognizedText);

From here https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?

I'm using version V3, and there ain't code solution instead some command line solution - not relevant for android project (I think...). So I tried to implement the solution for version < V3 and add this line:

baseApi.SetVariable("tessedit_char_whitelist", "0123456789");

My question is what to do with the init()? I don't need any language, but still I need to init & aint init() method...

EDIT: To be more specific

My end goal is plain document (not pure Excel sheet), that looks like the attached picture (header & 3 columns separated by white spaces).

My requirements is to make sense in the digits: To be able to separate and determine which digits belong to which row and column.

Thanks,

标签： android ocr tesseract tess-two

2条回答

地球回转人心会变

2楼-- · 2020-07-14 05:51

I wanted to do the same and after a bit of research I decided to capture all, text and numbers, and then just keep the numbers, this is working for me:

//This Replaces all except numbers from 0 to 9    
recognizedText = recognizedText.replaceAll("[^0-9]+", " ");

And now you can do whatever you want with the numbers.

For example, I use this code to get all the numbers separated into an String array, and show them on a TextView

String[] justnumbers = recognizedText.trim().split(" "); //Deletes blank spaces and splits the numbers
YourTextView.setText(Arrays.toString(justnumbers).replaceAll("\\[|\\]", "")) //sets the numbers into the TextView and deletes the "[]" from the String Array

You can see it working here.

Hope this helps.

0人赞添加讨论(0) 举报

乱世女痞

3楼-- · 2020-07-14 05:58

I made it a bit different. Maybe it will be useful for somebody.

So you need to initialize first the API.

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(datapath, language, ocrEngineMode);

Then set the following variables

baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, ".,0123456789");
baseApi.setVariable("classify_bln_numeric_mode", "1");

In this way the engine will check only the numbers.

0人赞添加讨论(0) 举报

Android OCR detecting digits only using popular te

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间