How to make tesseract to recognize only numbers, w

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789")
for every symbol tesseract returns wrong digit.

Can I set a threshold value so that tesseract omits the symbols with low resemblance?

NOTE: I set tesseract to recognize only digits so there is no confusion between O and 0.

标签： ocr tesseract

7条回答

冷血范

2楼-- · 2019-03-08 03:28

For tesseract 3, i try to create config file according FAQ.

BEFORE calling an Init function or put this in a text file called tessdata/configs/digits:

tessedit_char_whitelist 0123456789

then, it works by using the command: tesseract imagename outputbase digits

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-03-08 03:29

If one want to match 0-9

tesseract myimage.png stdout -c tessedit_char_whitelist=0123456789

Or if one almost wants to match 0-9, but with one or more different characters

tesseract myimage.png stdout -c tessedit_char_whitelist=01234ABCDE

0人赞添加讨论(0) 举报

beautiful°

4楼-- · 2019-03-08 03:30

For tesseract 3, the command is simpler tesseract imagename outputbase digits according to FAQ. But it doesn't work for me very well.

I turn to try different psm options and find -psm 6 works best for my case.

man tesseract for details.

0人赞添加讨论(0) 举报

唯我独甜

5楼-- · 2019-03-08 03:33

What I do is to recognize everything, and when I have the text, I take out all the characters except numbers

//This replaces all except numbers from 0 to 9
recognizedText = recognizedText.replaceAll("[^0-9]+", " ");

This works pretty well for me.

0人赞添加讨论(0) 举报

smile是对你的礼貌

6楼-- · 2019-03-08 03:41

You can instruct tesseract to use only digits, and if that is not accurate enough then best chance of getting better results is to go trough training process: http://www.resolveradiologic.com/blog/2013/01/15/training-tesseract/

0人赞添加讨论(0) 举报

Evening l夕情丶

7楼-- · 2019-03-08 03:44

I made it a bit different (with tess-two). Maybe it will be useful for somebody.

So you need to initialize first the API.

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(datapath, language, ocrEngineMode);

Then set the following variables

baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "!?@#$%&*()<>_-+=/:;'\"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, ".,0123456789");
baseApi.setVariable("classify_bln_numeric_mode", "1");

In this way the engine will check only the numbers.

0人赞添加讨论(0) 举报

1 2 下一页

How to make tesseract to recognize only numbers, w

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间