Celery: stuck in infinitly repeating timeouts (Tim

2020-07-20 07:19发布

问题:

I defined some tasks with a time limit of 1200:

@celery.task(time_limit=1200)
def create_ne_list(text):
    c = Client()
    return c.create_ne_list(text)

I'm also using the worker_process_init signal to do some initialization, each time a new process starts:

@worker_process_init.connect
def init(sender=None, conf=None, **kwargs):
    init_system(celery.conf)
    init_pdf(celery.conf)

This initialization function takes several seconds to execute.

Besides that, I'm using the following configuration:

CELERY_RESULT_SERIALIZER = 'json'
CELERY_TASK_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT = ['json']
BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'
CELERY_TIMEZONE = 'Europe/Berlin'
CELERY_ENABLE_UTC = True

and start my worker with the following command:

celery -A isc worker -l info --concurrency=3

As expected, starting the worker results in the initialization function being called three times. Now, I can send tasks and they are being executed and everything seems to run smoothly.

BUT: As soon as a tasks exceeds its time limit, the worker gets caught in an infinite loop of spawning and being killed off again because of exceeding the time limit.

[2014-06-13 09:46:18,978: ERROR/MainProcess] Timed out waiting for UP message from <Worker(Worker-20381, started daemon)>
[2014-06-13 09:46:20,000: ERROR/MainProcess] Process 'Worker-20381' pid:18953 exited with 'signal 9 (SIGKILL)'
// new worker 20382 getting started, initialization getting triggerd and soon after that -->
[2014-06-13 09:46:18,978: ERROR/MainProcess] Timed out waiting for UP message from <Worker(Worker-20382, started daemon)>
[2014-06-13 09:46:20,000: ERROR/MainProcess] Process 'Worker-20382' pid:18954 exited with 'signal 9 (SIGKILL)'
// and so on....

Does anyone has an idea why this is happening?

回答1:

The answer seems to be that the signal worker_process_init requires the handler to not be blocking for more than 4 seconds.

http://celery.readthedocs.org/en/latest/userguide/signals.html#worker-process-init

Because my init function takes longer to execute, the worker will be terminated automatically. After that it naturally restarts and triggers the init function again, which then results in the worker being terminated again and so on.