celery missed heartbeat (on_node_lost)

2019-03-13 09:25发布

问题:

I just upgraded to celery 3.1 and now I see this i my logs ::

on_node_lost - INFO - missed heartbeat from celery@queue_name for every queue/worker in my cluster.

According to the docs BROKER_HEARTBEAT is off by default and I haven't configured it.

Should I explicitly set BROKER_HEARTBEAT=0 or is there something else that I should be checking?

回答1:

Saw the same thing, and noticed a couple of things in the log files.

1) There were messages about time drift at the start of the log and occasional missed heartbeats.

2) At the end of the log file, the drift messages went away and only the missed heartbeat messages were present.

3) There were no changes to the system when the drift messages went away... They just stopped showing up.

I figured that the drift itself was likely the problem itself.

After syncing the time on all the servers involved these messages went away. For ubuntu, run ntpdate as a cron or ntpd.



回答2:

Celery 3.1 added in the new mingle and gossip procedures. I too was getting a ton of missed heartbeats and passing --without-gossip to my workers cleared it up.

http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#mingle-worker-synchronization http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#gossip-worker-worker-communication



回答3:

I'm having a similar issue. I have found the reason in my case.

I have two server to run worker.

when I use "ping" to another server, I found when the ping time larger than 2 second, the log will show " missed heartbeat from celery@ ". The default heartbeat interval is 2 second.

The reason is my poor network. http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.heartbeat.html