I use celery to update RSS feeds in my news aggregation site. I use one @task for each feed, and things seem to work nicely.
There's a detail that I'm not sure to handle well though: all feeds are updated once every minute with a @periodic_task, but what if a feed is still updating from the last periodic task when a new one is started ? (for example if the feed is really slow, or offline and the task is held in a retry loop)
Currently I store tasks results and check their status like this:
import socket
from datetime import timedelta
from celery.decorators import task, periodic_task
from aggregator.models import Feed
_results = {}
@periodic_task(run_every=timedelta(minutes=1))
def fetch_articles():
for feed in Feed.objects.all():
if feed.pk in _results:
if not _results[feed.pk].ready():
# The task is not finished yet
continue
_results[feed.pk] = update_feed.delay(feed)
@task()
def update_feed(feed):
try:
feed.fetch_articles()
except socket.error, exc:
update_feed.retry(args=[feed], exc=exc)
Maybe there is a more sophisticated/robust way of achieving the same result using some celery mechanism that I missed ?
Based on MattH's answer, you could use a decorator like this:
then, use it like so...
From the official documentation: Ensuring a task is only executed one at a time.
This solution for celery working at single host with concurency greater 1. Other kinds (without dependencies like redis) of locks difference file-based don't work with concurrency greater 1.
Using https://pypi.python.org/pypi/celery_once seems to do the job really nice, including reporting errors and testing against some parameters for uniqueness.
You can do things like:
which just needs the following settings in your project:
If you're looking for an example that doesn't use Django, then try this example (caveat: uses Redis instead, which I was already using).
The decorator code is as follows (full credit to the author of the article, go read it)