Python PIL to extract number from image

2019-07-21 10:18发布

问题:

I have an image like this one:

and I would like to have a black number written on white so that I can use an OCR to recognise it. How could I achieve that in Python?

Many thanks,

John.

回答1:

If you just want to turn a white-on-black image to black-on-white, that's trivial; it's just invert:

from PIL import Image, ImageOps
img = Image.open('zero.jpg')
inverted = ImageOps.invert(img)
inverted.save('invzero.png')

If you also want to do some basic processing like increasing the contrast, see the other functions in the ImageOps module, like autocontrast. They're all pretty easy to use, but if you get stuck, you can always ask a new question. For more complex enhancements, look around the rest of PIL. ImageEnhance can be used to sharpen an image, ImageFilter can do edge detection and unsharp masking; etc. You may also want to change the format to greyscale (L8), or even black and white (L1); that's all in the Image.convert method.

Of course you have to know what processing you want to do. One thing you might want to try is playing around with the image in Photoshop or GIMP and keeping track of what operations you do, then looking for how to implement those operations in PIL. (It might be simpler to just use gimp-fu scripting in the first place instead of trying to use PIL…)



回答2:

You don't need to manipulate the image for OCR. For example, you could just use pytesser:

from PIL import Image
from pytesser import *
im = Image.open('wjNL6.jpg')
text = image_to_string(im)
print text

Output:

0