Pytesseract: UnicodeDecodeError: 'charmap'

2019-01-28 10:36发布

I'm running a large number of OCRs on screenshots with Pytesseract. This is working well in most cases, but a small number is causing this error:

pytesseract.image_to_string(image,None, False, "-psm 6")
Pytesseract: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 2: character maps to <undefined>

I'm using Python 3.4. Any suggestions how I can prevent this error from happening (other than just a try/except) would be very helpful.

标签： python-3.x tesseract python-unicode python-tesseract

2条回答

Deceive 欺骗

2楼-- · 2019-01-28 11:08

Use Unidecode

from unidecode import unidecode
import pytesseract

strs = pytesseract.image_to_string(Image.open('binarized_image.png'))
strs = unidecode(strs)
print (strs)

0人赞添加讨论(0) 举报

祖国的老花朵

3楼-- · 2019-01-28 11:14

make sure you are using the right decoding options.
see https://docs.python.org/3/library/codecs.html#standard-encodings

str.decode('utf-8')
bytes.decode('cp950') for Traditional Chinese, etc

0人赞添加讨论(0) 举报

Pytesseract: UnicodeDecodeError: 'charmap'

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间