This question is a continuation of one I asked yesterday: I'm still not sure if a post_save handler or a 2nd Celery task is the best way to update many objects based on the results of the first Celery task, but I plan to test performance down the line. Here's a recap of what's happening:
Celery task, every 30s:
Update page_count field of Book object based on conditions
|
post_save(Book) |
V
Update some field on all Reader objects w/ foreign key to updated Book
(update will have different results per-Reader, thousands of Readers could be FKed to Book)
The first task could save ~10 objects, requiring the update to all related Reader objects for each.
Whichever proves to be better between post_save and another task, they must accomplish the same thing: update potentially tens to hundreds of thousands of objects in a table, with each object update being unique. It could be that my choice between post_save and Celery task is determined by which method will actually allow me to accomplish this goal.
Since I can't just use a few queryset update()
commands, I need to somehow call a method or function that calculates the value of a field based on the result of the first Celery task as well as some of the values in the object. Here's an example:
class Reader(models.Model):
book = models.ForeignKey(Book)
pages_read = models.IntegerField(default=0)
book_finished = models.BooleanField(default=False)
def determine_book_finished(self):
if self.pages_read == book.page_count:
self.book_finished = True
else:
self.book_finished = False
This is a contrived example, but if the page_count
was updated in the first task, I want all Readers
foreign keyed to the Book
to have their book_finished
recalculated- and looping over a queryset seems like a really inefficient way to go about it.
My thought was to somehow call a model method such as determine_book_finished()
on an entire queryset at once, but I can't find any documentation on how to do something like that- custom querysets don't appear to be intended for actually operating on objects in the queryset beyond the built-in update()
capability.
This post using Celery is the most promising thing I've found, and since Django signals are sync, using another Celery task would also have the benefit of not holding anything else up. So even though I'd still need to loop over a queryset, it'd be async and any querysets that needed to be updated could be handled by separate tasks, hopefully in parallel.
On the other had, this question seems to have a solution too- register the method with the post_save signal, which presumably would run the method on all objects after receiving the signal. Would this be workable with thousands of objects needing update, as well as potentially other Books
being updated by the same task and their thousands of associated Readers
then needing update too?
Is there a best practice for doing what I'm trying to do here?
EDIT: I realized I could go about this another way- making the book_finished
field a property determined at runtime rather than a static field.
@property
def book_finished:
if self.pages_read == self.book.page_count:
if self.book.page_count == self.book.planned_pages:
return True
else:
return False
This is close enough to my actual code- in that, the first if
branch contains a couple elif
branches, with each having their own if-else
for a total maximum depth of 3 if
s.
Until I can spin up a lot of test data and simulate many simultaneous users, I may stick with this option as it definitely works (for now). I don't really like having the property calculated every retrieval, but from some quick research, it doesn't seem like an overly slow method.