Django update queryset with annotation

2020-02-17 06:56发布

问题:

I want to update all rows in queryset by using annotated value.

I have a simple models:

class Relation(models.Model):
    rating = models.IntegerField(default=0)

class SignRelation(models.Model):
    relation = models.ForeignKey(Relation, related_name='sign_relations')
    rating = models.IntegerField(default=0)

And I want to awoid this code:

for relation in Relation.objects.annotate(total_rating=Sum('sign_relations__rating')):
    relation.rating = relation.total_rating or 0
    relation.save()

And do update in one SQL-request by using something like this:

Relation.objects.update(rating=Sum('sign_relations__rating'))

Doesn't work:

TypeError: int() argument must be a string or a number, not 'Sum'

or

Relation.objects.annotate(total_rating=Sum('sign_relations__rating')).update(rating=F('total_rating'))

Also doesn't work:

DatabaseError: missing FROM-clause entry for table "relations_signrelation"
LINE 1: UPDATE "relations_relation" SET "rating" = SUM("relations_si...

Is it possible to use Django's ORM for this purpose? There is no info about using update() and annotate() together in docs.

回答1:

For Django 1.11+ you can use Subquery:

from django.db.models import OuterRef, Subquery, Sum

Relation.objects.update(
    rating=Subquery(
        Relation.objects.filter(
            id=OuterRef('id')
        ).annotate(
            total_rating=Sum('sign_relations__rating')
        ).values('total_rating')[:1]
    )
)

This code produce the same SQL code proposed by Tomasz Jakub Rup but with no use of RawSQL expression (Django documentation warn you about using it because SQL injection).

Update

I published an article based on this answer with more in-depth explanations:

"Updating a Django queryset with annotation and subquery" on paulox.net



回答2:

UPDATE statement doesn't support GROUP BY. See e.g. PostgreSQL Docs, SQLite Docs.

You need someting like this:

UPDATE relation
SET rating = (SELECT SUM(rating)
              FROM sign_relation
              WHERE relation_id = relation.id)

Equivalent in DjangoORM:

from django.db.models.expressions import RawSQL

Relation.objects.all(). \
    update(rating=RawSQL('SELECT SUM(rating) FROM signrelation WHERE relation_id = relation.id', []))

or:

from django.db.models import F, Sum
from django.db.models.expressions import RawSQL

Relation.objects.all(). \
    update(rating=RawSQL(SignRelation.objects. \
                         extra(where=['relation_id = relation.id']). \
                         values('relation'). \
                         annotate(sum_rating=Sum('rating')). \
                         values('sum_rating').query, []))


回答3:

You can define your own custom objects manager:

class RelationManager(models.Manager):
    def annotated(self,*args,*kwargs):
         queryset = super(RelationManager,self).get_queryset()
         for obj in queryset:
               obj.rating = ... do something ...
         return queryset

class Relations(models.Model):
    rating = models.IntegerField(default=0)
    rating_objects = RelationManager()

Then in your code:

q = Realation.rating_objects.annotated()

Add args/kwargs to customise what this manager returns.



回答4:

Workaround for postgres:

with connection.cursor() as cursor:
    sql, params = qs.query.sql_with_params()
    cursor.execute("""
        WITH qs AS ({})
        UPDATE foo SET bar = qs.bar
        FROM qs WHERE qs.id = foo.id
    """.format(sql), params)


回答5:

You really can't do this. Take a look at the code for update and follow it through for some fine reading.

Honestly, what's wrong with placing something like this in a Manager definition? Put those 3 lines you don't want to put in your view into a manager, call that manager as necessary. Additionally, you're doing much less "magic" and when the next developer looks at your code, they won't have to resort to a few WTF's .. :)

Also, I was curious and it looks like you can use SQL Join with UPDATE statements but it's some classic SQL hackery .. So if you're so inclined, you can use Djangos raw SQL functionality for that ;)



回答6:

If you want to avoid many calls to the database, you should use transaction.atomic.

Read more on Django documentation: https://docs.djangoproject.com/en/1.9/topics/db/transactions/#controlling-transactions-explicitly