I've done a ton of research on this, and I'm surprised I haven't found a good answer to this yet anywhere.
I'm running a large application on Heroku, and I have certain celery tasks that run for a very long time processing, and at the end of the task save a result. Every time I redeploy on Heroku, it sends SIGTERM (and eventually, SIGKILL) and kills my running worker. I'm trying to find a way for the worker instance to shut itself down gracefully and re-queue itself for processing later so that eventually we can save the required result instead of losing the queued task.
I cannot find a way that works to have the worker listen for SIGTERM properly. The closest I've gotten, which works when running python manage.py celeryd
directly but NOT when emulating Heroku using foreman, is the following:
@app.task(bind=True, max_retries=1)
def slow(self, x):
try:
for x in range(100):
print 'x: ' + unicode(x)
time.sleep(10)
except exceptions.MaxRetriesExceededError:
logger.error('whoa')
except (exceptions.WorkerShutdown, exceptions.WorkerTerminate) as exc:
logger.error(u'retrying, ' + unicode(exc))
raise self.retry(exc=exc, countdown=10)
except (KeyboardInterrupt, SystemExit) as exc:
print 'retrying'
raise self.retry(exc=exc, countdown=10)
else:
return x
finally:
logger.info('task ended!')
When I start this celery task running within foreman and hit Ctrl+C, the following happens:
^CSIGINT received
22:20:59 system | sending SIGTERM to all processes
22:20:59 web.1 | exited with code 0
22:21:04 system | sending SIGKILL to all processes
Killed: 9
So it's clear that none of the celery exceptions, nor the KeyboardInterrupt
or SystemExit
exceptions I've seen in other posts, properly catch SIGTERM and shut down the worker.
What is the right way to do this?