I just upgraded to celery 3.1 and now I see this i my logs ::
on_node_lost - INFO - missed heartbeat from celery@queue_name for every queue/worker in my cluster.
According to the docs BROKER_HEARTBEAT
is off by default and I haven't configured it.
Should I explicitly set BROKER_HEARTBEAT=0
or is there something else that I should be checking?
Saw the same thing, and noticed a couple of things in the log files.
1) There were messages about time drift at the start of the log and occasional missed heartbeats.
2) At the end of the log file, the drift messages went away and only the missed heartbeat messages were present.
3) There were no changes to the system when the drift messages went away... They just stopped showing up.
I figured that the drift itself was likely the problem itself.
After syncing the time on all the servers involved these messages went away. For ubuntu, run ntpdate as a cron or ntpd.
Celery 3.1 added in the new mingle and gossip procedures. I too was getting a ton of missed heartbeats and passing --without-gossip to my workers cleared it up.
http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#mingle-worker-synchronization
http://docs.celeryproject.org/en/latest/whatsnew-3.1.html#gossip-worker-worker-communication
I'm having a similar issue. I have found the reason in my case.
I have two server to run worker.
when I use "ping" to another server,
I found when the ping time larger than 2 second, the log will show " missed heartbeat from celery@ ". The default heartbeat interval is 2 second.
The reason is my poor network.
http://docs.celeryproject.org/en/latest/internals/reference/celery.worker.heartbeat.html