Django celerybeat periodic task only runs once

2019-09-16 07:30发布

问题:

I am trying to schedule a task that runs every 10 minutes using Django 1.9.8, Celery 4.0.2, RabbitMQ 2.1.4, Redis 2.10.5. These are all running within Docker containers in Linux (Fedora 25). I have tried many combinations of things that I found in Celery docs and from this site. The only combination that has worked thus far is below. However, it only runs the periodic task initially when the application starts, but the schedule is ignored thereafter. I have absolutely confirmed that the scheduled task does not run again after the initial time.

My (almost-working) setup that only runs one-time:

settings.py:

INSTALLED_APPS = (
   ...
   'django_celery_beat',
   ...
)
BROKER_URL = 'amqp://{user}:{password}@{hostname}/{vhost}/'.format(
    user=os.environ['RABBIT_USER'],
    password=os.environ['RABBIT_PASS'],
    hostname=RABBIT_HOSTNAME,
    vhost=os.environ.get('RABBIT_ENV_VHOST', '')

# We don't want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = '?heartbeat=30'
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
    BROKER_URL += BROKER_HEARTBEAT

BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10

# Celery configuration

# configure queues, currently we have only one
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
    Queue('default', Exchange('default'), routing_key='default'),
)

# Sensible settings for celery
CELERY_ALWAYS_EAGER = False
CELERY_ACKS_LATE = True
CELERY_TASK_PUBLISH_RETRY = True
CELERY_DISABLE_RATE_LIMITS = False

# By default we will ignore result
# If you want to see results and try out tasks interactively, change it to False
# Or change this setting on tasks level
CELERY_IGNORE_RESULT = True
CELERY_SEND_TASK_ERROR_EMAILS = False
CELERY_TASK_RESULT_EXPIRES = 600

# Set redis as celery result backend
CELERY_RESULT_BACKEND = 'redis://%s:%d/%d' % (REDIS_HOST, REDIS_PORT, REDIS_DB)
CELERY_REDIS_MAX_CONNECTIONS = 1

# Don't use pickle as serializer, json is much safer
CELERY_TASK_SERIALIZER = "json"
CELERY_RESULT_SERIALIZER = "json"
CELERY_ACCEPT_CONTENT = ['application/json']
CELERYD_HIJACK_ROOT_LOGGER = False
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_MAX_TASKS_PER_CHILD = 1000

celeryconf.py

coding=UTF8
from __future__ import absolute_import
import os
from celery import Celery
from django.conf import settings

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "web_portal.settings")
app = Celery('web_portal')
CELERY_TIMEZONE = 'UTC'
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

tasks.py

from celery.schedules import crontab
from .celeryconf import app as celery_app

@celery_app.on_after_finalize.connect
def setup_periodic_tasks(sender, **kwargs):
    # Calls email_scanner every 10 minutes
    sender.add_periodic_task(
        crontab(hour='*', 
                minute='*/10', 
                second='*', 
                day_of_week='*',  
                day_of_month='*'),
        email_scanner.delay(),
    )

@app.task
def email_scanner():
    dispatch_list = scanning.email_scan()
    for dispatch in dispatch_list:
        validate_dispatch.delay(dispatch)
    return

run_celery.sh -- Used to start celery tasks from docker-compose.yml

#!/bin/sh

# wait for RabbitMQ server to start
sleep 10

cd web_portal
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery beat -l info --pidfile=/tmp/celerybeat-web_portal.pid -s /tmp/celerybeat-schedule &"
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default@%h"

I have also tried using a CELERYBEAT_SCHEDULER in the settings.py in lieu of the @celery_app.on_after finalize_connect decorator and block in tasks.py, but the scheduler never ran even once.

settings.py (not working at all scenario)

(same as before except also including the following)

CELERYBEAT_SCHEDULE = {
    'email-scanner-every-5-minutes': {
        'task': 'tasks.email_scanner',
        'schedule': timedelta(minutes=10)
    },
}

The Celery 4.0.2 documentation online presumes that I should instinctively know many givens, but I am new in this environment. If anybody knows where I can find a tutorial OTHER THAN docs.celeryproject.org and http://django-celery-beat.readthedocs.io/en/latest/ which both assume that I am already a Django master, I would be grateful. Or let me know of course if you see something obviously wrong in my setup. Thanks!

回答1:

I found a solution that works. I could not get CELERYBEAT_SCHEDULE or the celery task decorators to work, and I suspect that it may be at least partially due with the manner in which I started the Celery beat task.

The working solution goes the whole 9 yards to utilize Django Database Scheduler. I downloaded the GitHub project "https://github.com/celery/django-celery-beat" and incorporated all of the code as another "app" in my project. This enabled Django-Admin access to maintain the cron / interval / periodic task(s) tables via a browser. I also modified my run_celery.sh as follows:

#!/bin/sh

# wait for RabbitMQ server to start
sleep 10
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
celery beat -A web_portal.celeryconf -l info --pidfile=/tmp/celerybeat- web_portal.pid -S django --detach
su -m myuser -c "celery worker -A web_portal.celeryconf -Q default -n default@%h -l info "

After adding a scheduled task via the django-admin web interface, the scheduler started working fine.