How to represent:
- Create new image with paint (any size)
- Add letter A to this image
- Try to recognize -> tesseract will not find any letters
- Copy-paste this letter 5-6 times to this image
- Try to recognize -> tesseract will find all the letters
Why?
How to represent:
Why?
You must set the "page segmentation mode" to "single char".
For example, in Android you do the following:
api.setPageSegMode(TessBaseAPI.pageSegMode.PSM_SINGLE_CHAR);
You need to set Tesseract's page segmentation mode to "single character."
Have you seen this?
https://code.google.com/p/tesseract-ocr/issues/detail?id=581
The bug list shows it as "no longer an issue".
baseApi.setVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
code before the init Tesseract
python code to do that configuration is like this:
import pytesseract
import cv2
img = cv2.imread("path to some image")
pytesseract.image_to_string(img, config="-c tessedit"
"_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
" --psm 10"
" -l osd"
" "))
the --psm
flag defines the page segmentation mode.
according to documentaion of tesseract, 10
means :
Treat the image as a single character.
so to recognize a single character you just need to use : --psm 10
flag.