Django REST Serializer doing N+1 database calls fo

2019-03-31 21:51发布

问题:

I have a situation where my model has a Foreign Key relationship:

# models.py
class Child(models.Model):
    parent = models.ForeignKey(Parent,)

class Parent(models.Model):
    pass

and my serializer:

class ParentSerializer(serializer.ModelSerializer):
    child = serializers.SerializerMethodField('get_children_ordered')

    def get_children_ordered(self, parent):
        queryset = Child.objects.filter(parent=parent).select_related('parent')
        serialized_data = ChildSerializer(queryset, many=True, read_only=True, context=self.context)
        return serialized_data.data

    class Meta:
        model = Parent

When I call Parent in my views for N number of Parents, Django does N number of database calls inside the serializer when it grabs the children. Is there any way to get ALL children for ALL Parents to minimize the number of database calls?

I've tried this but it doesn't seem to solve my issue:

class ParentList(generics.ListAPIView):

    def get_queryset(self):
        queryset = Parent.objects.prefetch_related('child')
        return queryset

    serializer_class = ParentSerializer
    permission_classes = (permissions.IsAuthenticated,)

EDIT

I've updated the code below to reflect Alex's feedback....which solves the N+1 for one nested relationship.

# serializer.py
class ParentSerializer(serializer.ModelSerializer):
    child = serializers.SerializerMethodField('get_children_ordered')

    def get_children_ordered(self, parent):
        # The all() call should hit the cache
        serialized_data = ChildSerializer(parent.child.all(), many=True, read_only=True, context=self.context)
        return serialized_data.data

    class Meta:
            model = Parent

# views.py
class ParentList(generics.ListAPIView):

    def get_queryset(self):
        children = Prefetch('child', queryset=Child.objects.select_related('parent'))
        queryset = Parent.objects.prefetch_related(children)
        return queryset

    serializer_class = ParentSerializer
    permission_classes = (permissions.IsAuthenticated,)

Now let's say I have one more model, which is a grandchild:

# models.py
class GrandChild(models.Model):
    parent = models.ForeignKey(Child,)

class Child(models.Model):
    parent = models.ForeignKey(Parent,)

class Parent(models.Model):
    pass

If i place the following in my views.py for the Parent queryset:

queryset = Parent.objects.prefetch_related(children, 'children__grandchildren')

It doesn't look like those grandchildren are being carried on into the ChildSerializer, and thus, again I'm running another N+1 issue. Any thoughts on this one?

EDIT 2

Perhaps this will provide clarity...Maybe the reason i am still running into N + 1 database calls, is because both my children and grandchildren classes are Polymorphic.... i.e.

# models.py
class GrandChild(PolymorphicModel):
    child = models.ForeignKey(Child,)

class GrandSon(GrandChild):
    pass

class GrandDaughter(GrandChild):
    pass

class Child(PolymorphicModel):
    parent = models.ForeignKey(Parent,)

class Son(Child):
    pass

class Daughter(Child):
    pass

class Parent(models.Model):
    pass

and my serializers look more like this:

# serializer.py
class ChildSerializer(serializer.ModelSerializer):
    grandchild = serializers.SerializerMethodField('get_children_ordered')

    def to_representation(self, value):
        if isinstance(value, Son):
            return SonSerializer(value, context=self.context).to_representation(value)
        if isinstance(value, Daughter):
            return DaughterSerializer(value, context=self.context).to_representation(value)

    class Meta:
        model = Child

class ParentSerializer(serializer.ModelSerializer):
    child = serializers.SerializerMethodField('get_children_ordered')

    def get_children_ordered(self, parent):
        queryset = Child.objects.filter(parent=parent).select_related('parent')
        serialized_data = ChildSerializer(queryset, many=True, read_only=True, context=self.context)
        return serialized_data.data

    class Meta:
        model = Parent

Plus the same for Grandaughter, Grandson, I'll spare you the details codewise, but i think you get the picture.

When i run my view for ParentList, and i monitor DB queries, I'm getting something along the lines of 1000s of queries, for only a handful of parents.

If i run the same code in the django shell, i can accomplish the same query at no more than 25 queries. I suspect maybe it has something to do with the fact that I'm using the django-polymorphic library? The reason being is that, there's a Child and GrandChild database table, in additions to each Son/Daughter, Grandson/Granddaughter table, for a total of 6 tables. across those objects. So my gut tells me i'm missing those polymorphic tables.

Or perhaps there's a more elegant solution for my daata model?

回答1:

As far as I remember, nested serializers have access to prefetched relations, just make sure you don't modify a queryset (i.e. use all()):

class ParentSerializer(serializer.ModelSerializer):
    child = serializers.SerializerMethodField('get_children_ordered')

    def get_children_ordered(self, parent):
        # The all() call should hit the cache
        serialized_data = ChildSerializer(parent.child.all(), many=True, read_only=True, context=self.context)
        return serialized_data.data

    class Meta:
            model = Parent


class ParentList(generics.ListAPIView):

    def get_queryset(self):
        children = Prefetch('child', queryset=Child.objects.select_related('parent'))
        queryset = Parent.objects.prefetch_related(children)
        return queryset

    serializer_class = ParentSerializer
    permission_classes = (permissions.IsAuthenticated,)             


回答2:

This question is a bit old but I just came across a very similar problem and managed to reduce the db calls drastically. It seems to me that Django-mptt would make things much easier for you.

One way would be to define a single model with a ForeignKey to. This way you can find out the hierarchy by it's level in the tree. For example:

class Person(MPTTModel):
    parent = TreeForeignKey('self', null=True, blank=True, related_name='children', db_index=True)  

You can find out if the object is a parent by checking if Person.level = 0. If it equals 1, it's a child, 2 grandchild, etc...

Then, you could modify your code to the following:

# serializers.py
class ChildSerializer(serializers.ModelSerializer):
    children = serializers.SerializerMethodField()

    def get_children(self, parent):
        queryset = parent.get_children()
        serialized_data = ChildSerializer(queryset, many=True, read_only=True, context=self.context)
        return serialized_data.data

# views.py
class ParentList(generics.ListAPIView):

    def get_queryset(self):
        queryset = cache_tree_children(Person.objects.all())

With this, you'll eliminate your N+1 problem. If you want to add a new ForeignKey to a genre Model, for example, you could simply modify the last line to:

queryset = cache_tree_children(Person.objects.filter(channel__slug__iexact=channel_slug).select_related('genre'))