Django annotate queryset with intersection count

2019-07-14 09:30发布

问题:

Djangonauts, I need to tap your brains.

In a nutshell, I have the following three models:

class Location(models.Model):
    name = models.CharField(max_length=100)


class Profile(models.Model):
    locations_of_interest = models.ManyToManyField(Location)


class Question(models.Model):
    locations = models.ManyToManyField(Location)

I want to find all Profiles which locations of interest intersect with the locations specified for a certain question. That’s easy:

question = Question.objects.first()

matching_profiles = Profile.objects.filter(
    locations_of_interest__in=question.locations.all()
)

But in addition, I would also like to know to what extend the locations overlap.

In plain python, I could do something like this:

question_location_names = [l['name'] for l in question.locations.all()]

for profile in matching_profiles:
    profile_location_names = [l['name'] for l in profile.locations_of_interest.all()]
    intersection = set(question_location_names).intersection(profile_location_names)
    intersection_count = len(list(intersection))
    # then proceed with this number

However, it seems to me favourable to do the operation directly in the database if possible.

TL;DR

So my question is:

Is there a way to annotate the profile queryset with this intersection count and that way do the operation in the database?

I have tried several things, but I don’t think they are helpful for those who read this and might know an answer.

回答1:

You can perform this with an .annotate(..) this with a Count(..) on the locations_of_interest number:

from django.db.models import Count

matching_profiles = Profile.objects.filter(
    locations_of_interest__in=question.locations.all()
).annotate(
    locnom=Count('locations_of_interest')
)

Now every matching_profiles instance, will have an attribute called locnom that contains the number of location of interests that matched with the filter.

Note that Profiles without such locations, will not be in the queryset, and that every Profile will occur at most once.

EDIT: counting multiple related non-overlapping (!) fields

You can extend this approach by counting non-overlapping joins, by using distinct=True:

from django.db.models import Count

matching_profiles = Profile.objects.filter(
    locations_of_interest__in=question.locations.all(),
    industries_of_interest__in=question.industries.all()
).annotate(
    locnom=Count('locations_of_interest', distinct=True),
    indnom=Count('industries_of_interest', distinct=True)
)

Note however that this approach scales typically then exponentially with the number of JOINs, so this is typically not scalable if you would add tens our hundreds of annotations.