Django related objects are missing from celery tas

2020-04-21 07:54发布

Strange behavior, that I don't know how to explain. I've got a model, Track, with some related points. I call a celery task to performs some calculations with points, and they seem to be perfectly reachable in the method itself, but unavailable in celery task.

@shared_task
def my_task(track):
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    my_task.delay(t)

That prints the following:

before the task, 21346, 2971
in the task, 21346, 0

Strange thing though, when I put a time.sleep(10) at the first line of my_task or before calling my_task at all, it works out well, like there's some race condition. But the first printed line clearly says that points are available in the database, when it makes a select query (track.points.all().count()).

3条回答
够拽才男人
2楼-- · 2020-04-21 08:37

So, I've solved it using django-transaction-hooks. It still looks kinda scary to replace my DB backend, but django-celery-transactions seems to be broken in Django 1.6. Now my setup looks like this:

settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'transaction_hooks.backends.postgresql_psycopg2',
        'NAME': 'foo',
        },
    }
SOUTH_DATABASE_ADAPTERS = {'default':'south.db.postgresql_psycopg2'}  # this is required, or South breaks

models.py:

from django.db import connection

@shared_task
def my_task(track):
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    connection.on_commit(lambda: my_task.delay(t))

Results:

before the task, 21346, 2971
in the task, 21346, 2971

It still seems strange that such a common use case has no native celery or Django solution.

查看更多
仙女界的扛把子
3楼-- · 2020-04-21 08:39

I'm going to assume this is due to transaction isolation.

Django transactions by default are tied to requests; and while a transaction is active, no other process will see the changes until the transaction is committed. If you're in the middle of a save method, and there are quite a lot of other actions that take place before the request finishes, it seems likely that Celery starts processing the task before the transaction is committed. You could fix this by committing manually or by delaying the task.

查看更多
冷血范
4楼-- · 2020-04-21 08:54

You should NEVER pass model objects to celery tasks. This is because the session might expire (or be different) in the celery task compared to your Django application and this object will not be linked to the session and thus may not be available/beheave badly. What you should do is send the id. So something like track_id and then get the object from the database by issuing a query. That should most likely solve your problem.

@shared_task
def my_task(track_id):
    track = Track.query.get(track_id)  # Or how ever the query should be
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    my_task.delay(t.id)  # Pass the id here, not the object
查看更多
登录 后发表回答