Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.
相关问题
- How to get the bounding box of text that are overl
- Can't Compile Tesseract API example for WIndow
- How to improve OCR accuracy?
- What preprocessing operations are performed by Tes
- Annoying python tesseract error Error opening data
相关文章
- Swift 3 - How do I improve image quality for Tesse
- I want to sort the words extracted from image in o
- Moroccan License Plate Recognition (LPR) using Ope
- Tesseract thinks my 1's are 7's
- How to hide the console window when I run tesserac
- Recognizing numbers in an image in java
- fatal error: strtok_r.h: No such file or directory
- Android Tesseract App crashes on OCR Function
To use whitelist in a config file or using the
-c tessedit_char_whitelist=...
command-line switch, in the newest 4.0 version you will have to set OCR Engine mode to the "Original Tesseract only". This is because the new "Neural nets LSTM" mode doesn't respect the whitelist setting. Example of proper command-line for 4.0 version:UPDATE: In newer versions (4.0) there's corrupted
eng.traineddata
file installed by default by Windows and some Linux installers. Temporary solution is to replacetessdata\eng.traineddata
file with one from older version. This file should be about 30MB. Otherwise you'll get Error: "Tesseract couldn't load any languages!" or similar.Just adding this for anyone using tesseract on Android. In your readOCR function where you set the language etc. add the following line;
you can also do blackList for characters to exclude.
In addition to the config file, is the
-c
flag:Create a config file (e.g "letters") in tessdata/configs directory - usually
/usr/share/tesseract/tessdata/configs
or
/usr/share/tesseract-ocr/tessdata/configs
And add this line to the config file:
...or maybe [a-z] works.. dunno :-)
Then call tesseract similar to this:
That will limit tesseract to recognize only the wanted characters