Suppose I have a model Box
with a GenericForeignKey
that points to either an Apple
instance or a Chocolate
instance. Apple
and Chocolate
, in turn, have ForeignKeys to Farm
and Factory
, respectively. I want to display a list of Box
es, for which I need to access Farm
and Factory
. How do I do this in as few DB queries as possible?
Minimal illustrative example:
class Farm(Model):
...
class Apple(Model):
farm = ForeignKey(Farm)
...
class Factory(Model):
...
class Chocolate(Model):
factory = ForeignKey(Factory)
...
class Box(Model)
content_type = ForeignKey(ContentType)
object_id = PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
...
def __unicode__(self):
if self.content_type == ContentType.objects.get_for_model(Apple):
apple = self.content_object
return "Apple {} from Farm {}".format(apple, apple.farm)
elif self.content_type == ContentType.objects.get_for_model(Chocolate):
chocolate = self.content_object
return "Chocolate {} from Factory {}".format(chocolate, chocolate.factory)
Here are a few things I tried. In all these examples, N is the number of Boxes. The query count assumes that the ContentType
s for Apple
and Chocolate
have already been cached, so the get_for_model()
calls do not hit the DB.
1) Naive:
print [box for box in Box.objects.all()]
This does 1 (fetch Boxes) + N (fetch Apple or Chocolate for each Box) + N (fetch Farm for each Apple and Factory for each Chocolate) queries.
2) select_related
doesn't help here, because Box.content_object
is a GenericForeignKey
.
3) As of django 1.4, prefetch_related
can fetch GenericForeignKey
s.
print [box for box in Box.objects.prefetch_related('content_object').all()]
This does 1 (fetch Boxes) + 2 (fetch Apples and Chocolates for all Boxes) + N (fetch Farm for each Apple and Factory for each Chocolate) queries.
4) Apparently prefetch_related
isn't smart enough to follow ForeignKeys of GenericForeignKeys. If I try:
print [box for box in Box.objects.prefetch_related(
'content_object__farm',
'content_object__factory').all()]
it rightfully complains that Chocolate
objects don't have a farm
field, and vice versa.
5) I could do:
apple_ctype = ContentType.objects.get_for_model(Apple)
chocolate_ctype = ContentType.objects.get_for_model(Chocolate)
boxes_with_apples = Box.objects.filter(content_type=apple_ctype).prefetch_related('content_object__farm')
boxes_with_chocolates = Box.objects.filter(content_type=chocolate_ctype).prefetch_related('content_object__factory')
This does 1 (fetch Boxes) + 2 (fetch Apples and Chocolates for all Boxes) + 2 (fetch Farms for all Apples and Factories for all Chocolates) queries. The downside is that I have to merge and sort the two querysets (boxes_with_apples
, boxes_with_chocolates
) manually. In my real application, I'm displaying these Boxes in a paginated ModelAdmin. It's not obvious how to integrate this solution there. Maybe I could write a custom Paginator to do this caching transparently?
6) I could cobble together something based on this that also does O(1) queries. But I'd rather not mess with internals (_content_object_cache
) if I can avoid it.
In summary: Printing a Box requires access to the ForeignKeys of a GenericForeignKey. How can I print N Boxes in O(1) queries? Is (5) the best I can do, or is there a simpler solution?
Bonus points: How would you refactor this DB schema to make such queries easier?