I have a web-service that runs long-running jobs (in the order of several hours). I am developing this using Flask, Gunicorn, and nginx.
What I am thinking of doing is to have the route which takes a long time to complete, call a function that creates a thread. The function will then return a guid back to the route, and the route will return a url (using the guid) that the user can use to check progress. I am making the thread a daemon (thread.daemon = True) so that the thread exits if my calling code exits (unexpectedly).
Is this the correct approach to use? It works, but that doesn't mean that it is correct.
my_thread = threading.Thread(target=self._run_audit, args=())
my_thread.daemon = True
my_thread.start()
The more regular approch to handle such issue is extract the action from the base application and call it outside, using a task manager system like Celery.
Using this tutorial you can create your task and trigger it from your web application.
from flask import Flask
app = Flask(__name__)
app.config.update(
CELERY_BROKER_URL='redis://localhost:6379',
CELERY_RESULT_BACKEND='redis://localhost:6379'
)
celery = make_celery(app)
@celery.task()
def add_together(a, b):
return a + b
Then you can run:
>>> result = add_together.delay(23, 42)
>>> result.wait()
65
Just remember you need to run worker separately:
celery -A your_application worker
Celery and RQ is overengineering for simple task.
Take a look at this docs - https://docs.python.org/3/library/concurrent.futures.html
Also check example, how to run long-running jobs in background for Flask app - https://stackoverflow.com/a/39008301/5569578
Well, Although your approach is not incorrect, basicly it may lead your program run out of available threads. As Ali mentioned, a general approach is to use Job Queues like RQ
or Celery
. However you don't need to extract functions to use those libraries. For Flask, I recommend you to use Flask-RQ. It's simple to start:
RQ
pip install flask-rq
Just remember to install Redis before using it in your Flask app.
And simply use @Job Decorator in your Flask functions:
from flask.ext.rq import job
@job
def process(i):
# Long stuff to process
process.delay(3)
And finally you need rqworker
to start the worker:
rqworker
You can see RQ docs for more info. RQ designed for simple long running processes.
Celery
Celery is more complicated, has huge list of features and is not recommended if you are new to job queues and distributed processing methods.
Greenlets
Greenlets have switches. Let you to switch between long running processes.
You can use greenlets for running processes. The benefit is you don't need Redis and other worker, instead you have to re-design your functions to be compatible:
from greenlet import greenlet
def test1():
print 12
gr2.switch()
print 34
def test2():
print 56
gr1.switch()
print 78
gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()
Your approach is fine and will totally work, but why reinvent the background worker for python web applications when a widely accepted solution exists, namely celery.
I'd need to see a lot tests before I trusted any home rolled code for such an important task.
Plus celery gives you features like task persistence and the ability to distribute workers across multiple machines.