可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to run a basic and very simple code in python.
from PIL import Image
import pytesseract
im = Image.open("sample1.jpg")
text = pytesseract.image_to_string(im, lang = 'eng')
print(text)
This is what it looks like, I have actually installed tesseract for windows through the installer. I'm very new to Python, and I'm unsure how to proceed?
Any guidance here would be very helpful. I've tried restarting my Spyder application but to no avail.
回答1:
I see steps are scattered in different answers. Based on my recent experience with this pytesseract error on Windows, writing different steps in sequence to make it easier to resolve the error:
1. Install tesseract using windows installer available at: https://github.com/UB-Mannheim/tesseract/wiki
2. Note the tesseract path from the installation.Default installation path at the time the time of this edit was: C:\Users\USER\AppData\Local\Tesseract-OCR
. It may change so please check the installation path.
3. pip install pytesseract
4. set the tesseract path in the script before calling image_to_string
:
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'
回答2:
First you should install binary:
On Linux
sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install libtesseract-dev
On Mac
brew install tesseract
On Windows
download binary from https://github.com/UB-Mannheim/tesseract/wiki. then add pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
to your script.
Then you should install python package using pip:
pip install tesseract
pip install tesseract-ocr
references:
https://pypi.org/project/pytesseract/ (INSTALLATION section) and
https://github.com/tesseract-ocr/tesseract/wiki#installation
回答3:
In windows:
pip install tesseract
pip install tesseract-ocr
and check the file which is stored in your system usr/appdata/local/programs/site-pakages/python/python36/lib/pytesseract/pytesseract.py file
and compile the file
回答4:
From https://pypi.org/project/pytesseract/ :
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
回答5:
For Windows Only
1 - You need to have Tesseract OCR installed on your computer.
get it from here.
https://github.com/UB-Mannheim/tesseract/wiki
Download the suitable version.
2 - Save the tesseract path to your System Environment. i.e. Edit system variables.
3 - pip install pytesseract and pip install tesseract
4 - Add this line to your python script
pytesseract.pytesseract.tesseract_cmd = 'C:\\OCR\\Tesseract-OCR\\tesseract.exe'
^ your path may be different.
5 - Runt the code.
回答6:
you can install this package...
https://github.com/UB-Mannheim/tesseract/wiki
after that you should go this path C:\Program Files (x86)\Tesseract-OCR\ tesseract.exe
then run tesseract file.
I think this will help you...
回答7:
On mac(works for me)
brew install tesseract
回答8:
You would be needing to install tesseract.
https://github.com/tesseract-ocr/tesseract/wiki
Check out the above documentation on the installation.
回答9:
On Windows 64 bits, just add the following to the PATH environment variable:
"C:\Program Files\Tesseract-OCR"
and it will work.
回答10:
Use the following command to install tesseract
pip install tesseract
回答11:
Step 1:
Install tesseract on your system as per the OS.
Latest installers can be found at https://github.com/UB-Mannheim/tesseract/wiki
Step 2:
Install the following dependency libraries using :
pip install pytesseract
pip install opencv-python
pip install numpy
Step 3:
Sample code
import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string
# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Apply threshold to get image with only black and white
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
# Remove template file
#os.remove(temp)
return result
print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )
print("------ Done -------")
回答12:
# {Windows 10 instructions}
# before you use the script you need to install the dependence
# 1. download the tesseract from the official link:
# https://github.com/UB-Mannheim/tesseract/wiki
# 2. install the tesseract
# i chosed this path
# *replace the user string in the below path with you name of user that you are using in your current machine
# C:\Users\user\AppData\Local\Tesseract-OCR\
# 3. Install the pillow for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7):
# * if you are using another version of python first look how you start the python from you CMD
# * for some machine the run of python from the CMD is different
# [examples]
# =================================
# PYTHON VERSION 3.7
# python
# python3.7
# python -3.7
# python 3.7
# python3
# python -3
# python 3
# py3.7
# py -3.7
# py 3.7
# py3
# py -3
# py 3
# PYTHON VERSION 3.6
# python
# python3.6
# python -3.6
# python 3.6
# python3
# python -3
# python 3
# py3.6
# py -3.6
# py 3.6
# py3
# py -3
# py 3
# PYTHON VERSION 2.7
# python
# python2.7
# python -2.7
# python 2.7
# python2
# python -2
# python 2
# py2.7
# py -2.7
# py 2.7
# py2
# py -2
# py 2
# ================================
# we are using pip to install the dependences
# because for me i start the python version 3.7 with the following line
# py -3.7
# open the CMD in windows machine and type the following line:
# py -3.7 -m pip install pillow
# 4. Install the pytesseract and tesseract for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7):
# we are using pip to install the dependences
# open the CMD in windows machine and type the following lines:
# py -3.7 -m pip install pytesseract
# py -3.7 -m pip install tesseract
#!/usr/bin/python
from PIL import Image
import pytesseract
import os
import getpass
def extract_text_from_image(image_file_name_arg):
# IMPORTANT
# if you have followed my instructions to install this dependence in above text explanatin
# for my machine is
# if you don't put the right path for tesseract.exe the script will not work
username = getpass.getuser()
# here above line get the username for your machine automatically
tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe"
pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation
# specify the direction of your image files manually or use line bellow if the images are in the script directory in folder images
# image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images"
image_dir=os.getcwd()+"\\images"
dir_seperator="\\"
image_file_name=image_file_name_arg
# if your image are in different format change the extension(ex. ".png")
image_ext=".jpg"
image_path_dir=image_dir+dir_seperator+image_file_name+image_ext
print("=============================================================================")
print("image used is in the following path dir:")
print("\t"+image_path_dir)
print("=============================================================================")
img=Image.open(image_path_dir)
text=pytesseract.image_to_string(img, lang="eng")
print(text)
# change the name "image_1" whith the name without extension for your image name
# image_file_name_arg="image_1"
image_file_name_arg="image_2"
# image_file_name_arg="image_3"
# image_file_name_arg="image_4"
# image_file_name_arg="image_5"
extract_text_from_image(image_file_name_arg)
# ==================================
# CREATED BY: SHERIFI
# e-mail: sherif_co@yahoo.com
# git-link for script: https://github.com/sherifi/ai_example.git
# ==================================
回答13:
In windows, the command path must be redirected, for a default windows tesseract installation.
- In 32 bit system, add in this line after import commands.
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
- In 64 bit system, add this line instead.
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
回答14:
For Ubuntu 18.04
If you are getting an error like
tesseract is not installed or it's not in your path
and
OSError: [Errno 12] Cannot allocate memory
That might be and issue with the swap memory allocation issue
You can check this answer allocating more swap memory Hope that helps :)
https://askubuntu.com/questions/920595/fallocate-fallocate-failed-text-file-busy-in-ubuntu-17-04?answertab=active#tab-top