Can not make tesseract work in google app engine w

2020-03-26 09:08发布

I am trying to deploy an application on the Google App Engine that also has OCR function. I downloaded the tesseract using homebrew and using pytesseract to wrap in Python. The OCR function works on my local system, but it does not when I upload the application to the Google App Engine.

I copied tesseract folder from usr/local/cellar/tesseract and pasted into the working directory of my app. I uploaded the tesseract files and also pytesseract files to app engine. I have specified the path for tesseract with os.getcwd() so that pytesseract can find it. Nevertheless, this does not work. App engine cannot find the file to execute, since they are not in the same directory (os.getcwd()) .

Code from pytesseract.py

cmda = os.getcwd()
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY


def find_all(name, path):
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            result.append(os.path.join(root, name))
    return result

founds = find_all("tesseract",cmda)

tesseract_cmd = founds[0]

The error from Google App Engine is:

tesseract is not installed on your path.

1条回答
聊天终结者
2楼-- · 2020-03-26 09:33

The Google App Engine Standard environment is not suitable for your use case. It is true that the pytesseract and the Pillow libraries can be installed via pip. But these libraries require the tesseract-ocr and libtesseract-dev platform packages to be installed, which don't come in the base runtime for App Engine Standard Python3.7 runtime. This is producing the error you are getting.

The solution is to use Cloud Run, which will run your application in a Docker container and you will be able to customize your runtime. I have modified this Quickstart guide to run on Cloud Run a sample application that converts an image to text using pytesseract.

My folder structure:

├── sample
    ├── requirements.txt
    └── Dockerfile
    └── app.py
    └── test.png

Here is the Dockerfile:

# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.7

# Copy local code to the container image.
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./

# Install production dependencies.
RUN pip install Flask gunicorn
RUN pip install -r requirements.txt

#Install tesseract
RUN apt-get update -qqy && apt-get install -qqy \
        tesseract-ocr \
        libtesseract-dev

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 app:app

The contents of app.py:

from flask import Flask
from PIL import Image
import pytesseract


# If `entrypoint` is not defined in app.yaml, App Engine will look for an app
# called `app` in `main.py`.
app = Flask(__name__)

@app.route('/')
def hello():
    return pytesseract.image_to_string(Image.open('test.png'))


if __name__ == "__main__":
    app.run(debug=True,host='0.0.0.0',port=int(os.environ.get('PORT', 8080)))

The requirements.txt:

Flask==1.1.1
pytesseract==0.3.0
Pillow==6.2.0

Now to containerize and deploy your application just run:

  1. gcloud builds submit --tag gcr.io/<PROJECT_ID>/helloworld to build and submit the container to Container Registry.

  2. gcloud beta run deploy --image gcr.io/<PROJECT_ID>/helloworld --platform managed to deploy the container to Cloud Run.

查看更多
登录 后发表回答