GAE flex uses gunicorn as an entrypoint by default which is fine, except I have a function that takes a very long time to process (scraping websites and story data in a db) and gunicorn times out at 30 seconds by default, then a new worker starts all over on the task, and so on and so forth.
I can set the gunicorn timeout to something like 20 minutes, but it doesn't seem graceful. Is there any way to run these backend functions "outside" of gunicorn, or perhaps a gunicorn config I'm not thinking about? There is no client side, so the long time to complete isn't an issue.
My app.yaml file currently looks like this:
runtime: python
env: flex
entrypoint: gunicorn -b :$PORT main:app
runtime_config:
python_version: 2
# This sample incurs costs to run on the App Engine flexible environment.
# The settings below are to reduce costs during testing and are not appropriate
# for production use. For more information, see:
# https://cloud.google.com/appengine/docs/flexible/python/configuring-your app-with-app-yaml
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 3
disk_size_gb: 10
You can use async worker-class and then you won't need to set the timeout to 20 minutes. The default worker class is sync. Docs regarding the workers here.
Use the eventlet async worker (gevent not recommended if using google client libraries)
Then in your gunicorn instantiation set the worker-class = 'eventlet' and set number of workers to [number of cores] x 2 +1 (that's just a recommendation in google docs). For example:
Gunicorn Worker Configuration
Alternatively, use implementation described here using pubsub and workers.