Limit characters tesseract is looking for

Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.

标签： ocr tesseract

4条回答

Emotional °昔

2楼-- · 2019-01-05 09:40

To use whitelist in a config file or using the -c tessedit_char_whitelist=... command-line switch, in the newest 4.0 version you will have to set OCR Engine mode to the "Original Tesseract only". This is because the new "Neural nets LSTM" mode doesn't respect the whitelist setting. Example of proper command-line for 4.0 version:

tesseract input_file output_file --oem 0 -c tessedit_char_whitelist=abc123

UPDATE: In newer versions (4.0) there's corrupted eng.traineddata file installed by default by Windows and some Linux installers. Temporary solution is to replace tessdata\eng.traineddata file with one from older version. This file should be about 30MB. Otherwise you'll get Error: "Tesseract couldn't load any languages!" or similar.

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-01-05 09:43

Just adding this for anyone using tesseract on Android. In your readOCR function where you set the language etc. add the following line;

tesseract.setVariable("tessedit_char_whitelist","ABCDEFGHIJKLMNOPQRSTUVWXYZ");

you can also do blackList for characters to exclude.

0人赞添加讨论(0) 举报

▲ chillily

4楼-- · 2019-01-05 09:51

In addition to the config file, is the -c flag:

tesseract stdin stdout -c tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyz -psm 6

0人赞添加讨论(0) 举报

Juvenile、少年°

5楼-- · 2019-01-05 09:53

Create a config file (e.g "letters") in tessdata/configs directory - usually /usr/share/tesseract/tessdata/configs
or
/usr/share/tesseract-ocr/tessdata/configs

And add this line to the config file:

tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz

...or maybe [a-z] works.. dunno :-)
Then call tesseract similar to this:

tesseract input.tif output nobatch letters

That will limit tesseract to recognize only the wanted characters

0人赞添加讨论(0) 举报

Limit characters tesseract is looking for

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间