I am working on a small but computationally-intensive Python app. The computationally-intensive work can be broken into several pieces that can be executed concurrently. I am trying to identify a suitable stack to accomplish this.
Currently I am planning to use a Flask app on Apache2+WSGI with Celery for the task queue.
In the following, will a_long_process()
, another_long_process()
and yet_another_long_process()
execute concurrently if there are 3 or more workers available? Will the Flask app be blocked while the processes are executing?
from the Flask app:
@myapp.route('/foo')
def bar():
task_1 = a_long_process.delay(x, y)
task_1_result = task_1.get(timeout=1)
task_2 = another_long_process.delay(x, y)
task_2_result = task_2.get(timeout=1)
task_3 = yet_another_long_process.delay(x, y)
task_3_result = task_3.get(timeout=1)
return task_1 + task_2 + task_3
tasks.py:
from celery import Celery
celery = Celery('tasks', broker="amqp://guest@localhost//", backend="amqp://")
@celery.task
def a_long_process(x, y):
return something
@celery.task
def another_long_process(x, y):
return something_else
@celery.task
def yet_another_long_process(x, y):
return a_third_thing
According to the documentation for
result.get()
, it waits until the result is ready before returning, so normally it is in fact blocking. However, since you havetimeout=1
, the call toget()
will raise a TimeoutError if the task takes longer than 1 second to complete.By default, Celery workers start with a concurrency level set equal to the number of CPUs available. The concurrency level seems to determine the number of threads that can be used to process tasks. So, with a concurrency level >= 3, it seems like the Celery worker should be able to process that many tasks concurrently (perhaps someone with greater Celery expertise can verify this?).
You should change your code so the workers can work in parallel:
This code will block until all results are available (or the timeout is reached).
This code will only block one worker of your WSGI container. Wether the entire site is unresponsive depends on the WSGI container you are using. (e.g. Apache + mod_wsgi, uWSGI, gunicorn, etc.) Most WSGI containers spawn multiple workers so only one worker will be blocked while your code waits for the task results.
For this kind of application I would recommend using gevent which spawns a separate greenlet for every request and is very lightweight.
Use the Group feature of celery canvas:
Here is the example provided in the documentation:
Which outputs
[4, 8]
.