I need to use pytesseract to extract text from this picture:
and the code:
from PIL import Image, ImageEnhance, ImageFilter
import pytesseract
path = 'pic.gif'
img = Image.open(path)
img = img.convert('RGBA')
pix = img.load()
for y in range(img.size[1]):
for x in range(img.size[0]):
if pix[x, y][0] < 102 or pix[x, y][1] < 102 or pix[x, y][2] < 102:
pix[x, y] = (0, 0, 0, 255)
else:
pix[x, y] = (255, 255, 255, 255)
img.save('temp.jpg')
text = pytesseract.image_to_string(Image.open('temp.jpg'))
# os.remove('temp.jpg')
print(text)
Not bad, but the result of print is ,2 WW
Not the right text2HHH
, so how can I remove those black dots?
you only need grow up the size of picture by cv2.resize
my picture 200x40 -> HZUBS
resized same picture 1400x300 -> A 1234 (so, this is right)
and then,
and change parameters for enhance results
To extract the text directly from the web, you can try the following implementation
(making use of the first image)
:I have something different pytesseract approach for our community. Here is my approach
Here is my small advancement with removing noise and arbitrary line within certain colour frequency range.
Here is my solution: