Django: updating many objects with per-object calc

2019-08-09 09:36发布

This question is a continuation of one I asked yesterday: I'm still not sure if a post_save handler or a 2nd Celery task is the best way to update many objects based on the results of the first Celery task, but I plan to test performance down the line. Here's a recap of what's happening:

Celery task, every 30s:
Update page_count field of Book object based on conditions
                 |
post_save(Book)  |  
                 V
Update some field on all Reader objects w/ foreign key to updated Book 
(update will have different results per-Reader, thousands of Readers could be FKed to Book)

The first task could save ~10 objects, requiring the update to all related Reader objects for each.

Whichever proves to be better between post_save and another task, they must accomplish the same thing: update potentially tens to hundreds of thousands of objects in a table, with each object update being unique. It could be that my choice between post_save and Celery task is determined by which method will actually allow me to accomplish this goal.

Since I can't just use a few queryset update() commands, I need to somehow call a method or function that calculates the value of a field based on the result of the first Celery task as well as some of the values in the object. Here's an example:

class Reader(models.Model):
    book = models.ForeignKey(Book)
    pages_read = models.IntegerField(default=0)
    book_finished = models.BooleanField(default=False)

    def determine_book_finished(self):
       if self.pages_read == book.page_count:
           self.book_finished = True
       else:
           self.book_finished = False

This is a contrived example, but if the page_count was updated in the first task, I want all Readers foreign keyed to the Book to have their book_finished recalculated- and looping over a queryset seems like a really inefficient way to go about it.

My thought was to somehow call a model method such as determine_book_finished() on an entire queryset at once, but I can't find any documentation on how to do something like that- custom querysets don't appear to be intended for actually operating on objects in the queryset beyond the built-in update() capability.

This post using Celery is the most promising thing I've found, and since Django signals are sync, using another Celery task would also have the benefit of not holding anything else up. So even though I'd still need to loop over a queryset, it'd be async and any querysets that needed to be updated could be handled by separate tasks, hopefully in parallel.

On the other had, this question seems to have a solution too- register the method with the post_save signal, which presumably would run the method on all objects after receiving the signal. Would this be workable with thousands of objects needing update, as well as potentially other Books being updated by the same task and their thousands of associated Readers then needing update too?

Is there a best practice for doing what I'm trying to do here?

EDIT: I realized I could go about this another way- making the book_finished field a property determined at runtime rather than a static field.

@property
def book_finished:
  if self.pages_read == self.book.page_count:
    if self.book.page_count == self.book.planned_pages:
      return True
  else:
    return False

This is close enough to my actual code- in that, the first if branch contains a couple elif branches, with each having their own if-else for a total maximum depth of 3 ifs.

Until I can spin up a lot of test data and simulate many simultaneous users, I may stick with this option as it definitely works (for now). I don't really like having the property calculated every retrieval, but from some quick research, it doesn't seem like an overly slow method.

0条回答
登录 后发表回答