I'm using the popular OCR tessercat fork for android tess-two https://github.com/rmtheis/tess-two. I integrated all the staff and it works etc...
But I need to detect only digits, my code for now is:
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(pathToLngFile, langName);
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
doSomething(recognizedText);
From here https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?
I'm using version V3, and there ain't code solution instead some command line solution - not relevant for android project (I think...). So I tried to implement the solution for version < V3 and add this line:
baseApi.SetVariable("tessedit_char_whitelist", "0123456789");
My question is what to do with the init()? I don't need any language, but still I need to init & aint init() method...
EDIT: To be more specific
My end goal is plain document (not pure Excel sheet), that looks like the attached picture (header & 3 columns separated by white spaces).
My requirements is to make sense in the digits: To be able to separate and determine which digits belong to which row and column.
Thanks,
I wanted to do the same and after a bit of research I decided to capture all, text and numbers, and then just keep the numbers, this is working for me:
And now you can do whatever you want with the numbers.
For example, I use this code to get all the numbers separated into an String array, and show them on a TextView
You can see it working here.
Hope this helps.
I made it a bit different. Maybe it will be useful for somebody.
So you need to initialize first the API.
Then set the following variables
In this way the engine will check only the numbers.