Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following:
img = Image.open('test.jpg')
result = pytesseract.image_to_string(img, config='-psm 6')
I'm getting other characters like / for a 1 so I would like to limit the options of possible characters.
You can accomplish that with the below line. Or you can setup the config file for tesseract to do the same thing Limit characters tesseract is looking for
pytesseract.image_to_string(question_img, config="-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyz -psm 6")
I am sure there are other ways to get it work, but this is what worked for me.