I am using tesseract to perform OCR on screengrabs. I have an app using a tkinter window leveraging self.after in the initialization of my class to perform constant image scrapes and update label, etc values in the tkinter window. I have searched for multiple days and can't find any specific examples how to leverage CREATE_NO_WINDOW with Python3.6 on a Windows platform calling tesseract with pytesseract.
This is related to this question:
How can I hide the console window when I run tesseract with pytesser
I have only been programming Python for 2 weeks and don't understand what/how to perform the steps in the above question. I opened up the pytesseract.py file and reviewed and found the proc = subprocess.Popen(command, stderr=subproces.PIPE) line but when I tried editing it I got a bunch of errors that I couldn't figure out.
#!/usr/bin/env python
'''
Python-tesseract. For more information: https://github.com/madmaze/pytesseract
'''
try:
import Image
except ImportError:
from PIL import Image
import os
import sys
import subprocess
import tempfile
import shlex
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'
__all__ = ['image_to_string']
def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False,
config=None):
'''
runs the command:
`tesseract_cmd` `input_filename` `output_filename_base`
returns the exit status of tesseract, as well as tesseract's stderr output
'''
command = [tesseract_cmd, input_filename, output_filename_base]
if lang is not None:
command += ['-l', lang]
if boxes:
command += ['batch.nochop', 'makebox']
if config:
command += shlex.split(config)
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
status = proc.wait()
error_string = proc.stderr.read()
proc.stderr.close()
return status, error_string
def cleanup(filename):
''' tries to remove the given filename. Ignores non-existent files '''
try:
os.remove(filename)
except OSError:
pass
def get_errors(error_string):
'''
returns all lines in the error_string that start with the string "error"
'''
error_string = error_string.decode('utf-8')
lines = error_string.splitlines()
error_lines = tuple(line for line in lines if line.find(u'Error') >= 0)
if len(error_lines) > 0:
return u'\n'.join(error_lines)
else:
return error_string.strip()
def tempnam():
''' returns a temporary file-name '''
tmpfile = tempfile.NamedTemporaryFile(prefix="tess_")
return tmpfile.name
class TesseractError(Exception):
def __init__(self, status, message):
self.status = status
self.message = message
self.args = (status, message)
def image_to_string(image, lang=None, boxes=False, config=None):
'''
Runs tesseract on the specified image. First, the image is written to disk,
and then the tesseract command is run on the image. Tesseract's result is
read, and the temporary files are erased.
Also supports boxes and config:
if boxes=True
"batch.nochop makebox" gets added to the tesseract call
if config is set, the config gets appended to the command.
ex: config="-psm 6"
'''
if len(image.split()) == 4:
# In case we have 4 channels, lets discard the Alpha.
# Kind of a hack, should fix in the future some time.
r, g, b, a = image.split()
image = Image.merge("RGB", (r, g, b))
input_file_name = '%s.bmp' % tempnam()
output_file_name_base = tempnam()
if not boxes:
output_file_name = '%s.txt' % output_file_name_base
else:
output_file_name = '%s.box' % output_file_name_base
try:
image.save(input_file_name)
status, error_string = run_tesseract(input_file_name,
output_file_name_base,
lang=lang,
boxes=boxes,
config=config)
if status:
errors = get_errors(error_string)
raise TesseractError(status, errors)
f = open(output_file_name, 'rb')
try:
return f.read().decode('utf-8').strip()
finally:
f.close()
finally:
cleanup(input_file_name)
cleanup(output_file_name)
def main():
if len(sys.argv) == 2:
filename = sys.argv[1]
try:
image = Image.open(filename)
if len(image.split()) == 4:
# In case we have 4 channels, lets discard the Alpha.
# Kind of a hack, should fix in the future some time.
r, g, b, a = image.split()
image = Image.merge("RGB", (r, g, b))
except IOError:
sys.stderr.write('ERROR: Could not open file "%s"\n' % filename)
exit(1)
print(image_to_string(image))
elif len(sys.argv) == 4 and sys.argv[1] == '-l':
lang = sys.argv[2]
filename = sys.argv[3]
try:
image = Image.open(filename)
except IOError:
sys.stderr.write('ERROR: Could not open file "%s"\n' % filename)
exit(1)
print(image_to_string(image, lang=lang))
else:
sys.stderr.write('Usage: python pytesseract.py [-l lang] input_file\n')
exit(2)
if __name__ == '__main__':
main()
The code I am leveraging is similar to the example in the similar question:
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Apply threshold to get image with only black and white
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
return result
When it gets to the following line, there is a flash of a black console window for less than a second and then it closes when it runs the command.
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
Here is the picture of the console window:
Here is what is suggested from the other question:
You're currently working in IDLE, in which case I don't think it really matters if a console window pops up. If you're planning to develop a GUI app with this library, then you'll need to modify the subprocess.Popen call in pytesser.py to hide the console. I'd first try the CREATE_NO_WINDOW process creation flag. – eryksun
I would greatly appreciate any help for how to modify the subprocess.Popen call in the pytesseract.py library file using CREATE_NO_WINDOW. I am also not sure of the difference between pytesseract.py and pytesser.py library files. I would leave a comment on the other question to ask for clarification but I can't until I have more reputation on this site.
I did more research and decided to learn more about subprocess.Popen:
Documentation for subprocess
I also referenced the following articles:
using python subprocess.popen..can't prevent exe stopped working prompt
I changed the original line of code in pytesseract.py:
to the following:
I ran the code and got the following error:
I then defined the CREATE_NO_WINDOW variable:
I got the value of 0x08000000 from the above linked article. After adding the definition I ran the application and I didn't get any more console window popups.