Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'.

Like this:

target = pytesseract.image_to_string(im,config='-psm 7',config='outputbase digits')

Many thanks,

Niall

标签： python ocr tesseract

2条回答

\"骚年 ilove

2楼-- · 2019-03-10 04:32

The reason you are having trouble is because character restriction does not work in version 4.0. You have to force legacy mode (oem 0) to have it limit found characters. There is a bug somewhere in the tesseract team that they have not yet addressed.

0人赞添加讨论(0) 举报

在下西门庆

3楼-- · 2019-03-10 04:38

tesseract-4.0.0a supports below psm. If you want to have single character recognition, set psm = 10. And if your text consists of numbers only, you can set tessedit_char_whitelist=0123456789.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
                        bypassing hacks that are Tesseract-specific.

Here is a sample usage of image_to_string with multiple parameters.

target = pytesseract.image_to_string(image, lang='eng', boxes=False, \
        config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

Hope this helps.

0人赞添加讨论(0) 举报

Pytesseract OCR multiple config options

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间