How do I run a long-running job in the background

2019-01-12 00:08发布

问题:

I have a web-service that runs long-running jobs (in the order of several hours). I am developing this using Flask, Gunicorn, and nginx.

What I am thinking of doing is to have the route which takes a long time to complete, call a function that creates a thread. The function will then return a guid back to the route, and the route will return a url (using the guid) that the user can use to check progress. I am making the thread a daemon (thread.daemon = True) so that the thread exits if my calling code exits (unexpectedly).

Is this the correct approach to use? It works, but that doesn't mean that it is correct.

my_thread = threading.Thread(target=self._run_audit, args=())
my_thread.daemon = True
my_thread.start()

回答1:

The more regular approch to handle such issue is extract the action from the base application and call it outside, using a task manager system like Celery.

Using this tutorial you can create your task and trigger it from your web application.

from flask import Flask

app = Flask(__name__)
app.config.update(
    CELERY_BROKER_URL='redis://localhost:6379',
    CELERY_RESULT_BACKEND='redis://localhost:6379'
)
celery = make_celery(app)


@celery.task()
def add_together(a, b):
    return a + b

Then you can run:

>>> result = add_together.delay(23, 42)
>>> result.wait()
65

Just remember you need to run worker separately:

celery -A your_application worker


回答2:

Celery and RQ is overengineering for simple task. Take a look at this docs - https://docs.python.org/3/library/concurrent.futures.html

Also check example, how to run long-running jobs in background for Flask app - https://stackoverflow.com/a/39008301/5569578



回答3:

Well, Although your approach is not incorrect, basicly it may lead your program run out of available threads. As Ali mentioned, a general approach is to use Job Queues like RQ or Celery. However you don't need to extract functions to use those libraries. For Flask, I recommend you to use Flask-RQ. It's simple to start:

RQ

pip install flask-rq

Just remember to install Redis before using it in your Flask app.

And simply use @Job Decorator in your Flask functions:

from flask.ext.rq import job


@job
def process(i):
    #  Long stuff to process


process.delay(3)

And finally you need rqworker to start the worker:

rqworker

You can see RQ docs for more info. RQ designed for simple long running processes.

Celery

Celery is more complicated, has huge list of features and is not recommended if you are new to job queues and distributed processing methods.

Greenlets

Greenlets have switches. Let you to switch between long running processes. You can use greenlets for running processes. The benefit is you don't need Redis and other worker, instead you have to re-design your functions to be compatible:

from greenlet import greenlet

def test1():
    print 12
    gr2.switch()
    print 34

def test2():
    print 56
    gr1.switch()
    print 78

gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()


回答4:

Your approach is fine and will totally work, but why reinvent the background worker for python web applications when a widely accepted solution exists, namely celery.

I'd need to see a lot tests before I trusted any home rolled code for such an important task.

Plus celery gives you features like task persistence and the ability to distribute workers across multiple machines.