Django query for many-to-many subset containment

2020-07-06 09:13发布

问题:

Is there a way to query for subset or superset containment with many-to-many fields?

Suppose each Person has a list of birds they want to see, and each Aviary houses a list of birds. How can I make a query to find, for a given Person instance, which Aviaries have every bird on the person's list? And similarly, for a given Person instance, how do I find which Aviaries have only birds on the person's list (but not necessarily all of them).

Here are my Django 1.5 models:

class Bird(models.Model):
    name = models.CharField(max_length=255, unique=True)

class Aviary(models.Model):
    name = models.CharField(max_length=255, unique=True)
    birds = models.ManyToManyField(Bird)

class Person(models.Model):
    name = models.CharField(max_length=255, unique=True)
    birds_to_see = models.ManyToManyField(Bird)

I know how I would find the aviaries that have at least one of a person's birds, but I don't see how I would adapt that here. (See, for instance: django queryset for many-to-many field)

If there is a query that does what I want, I am also interested to know if/why it's preferable to doing this more "manually." For example, I could loop over aviaries, extract each aviary's bird list, and see if the person's birds_to_see is a subset or superset of the aviary's bird list:

def find_aviaries(self):
    person_birds = set(self.birds_to_see.all())
    found_aviaries = []
    for aviary in Aviary.objects.all():
        aviary_birds = set(aviary.birds.all())
        if person_birds.issubset(aviary_birds):
            found_aviaries.append(aviary)            
    return found_aviaries

Any help is appreciated!

回答1:

A nice solution exists for Django >= 2.0. It is possible to annotate Aviaries by the number of matching Birds and to filter Aviaries that match at least one Bird or a required number.

from django.db.models import Count

    ...
    person_birds = set(self.birds_to_see.all())
    aviaries = (
        Aviary.objects
        .annotate(bird_match_count=Count('birds', filter=Q(birds__in=person_birds)))
        .filter(bird_match_count__gt=0)
    )

It is then trivial to filter a new queryset by bird_match_count=len(person_birds) or to filter the original queryset in Python or to sort it by bird_match_count.

Django < 2.0 would require to reference the intermediary model AviaryBirds and will be more verbose.


Verified by reading the SQL:

>>> print(aviaries.query)
SELECT aviary.id, aviary.name,
  COUNT(CASE WHEN  aviary_birds.bird_id IN (1,..)  THEN aviary_birds.bird_id ELSE NULL END)
    AS bird_match_count
FROM aviary LEFT OUTER JOIN aviary_birds ON (aviary.id = aviary_birds.aviary_id)
GROUP BY aviary.id, aviary.name
HAVING
  COUNT(CASE WHEN (aviary_birds.bird_id IN (1,..)) THEN aviary_birds.bird_id ELSE NULL END)
     > 0


回答2:

Using a Postgres Subquery Array construct, you can annotate on the ids, and then filter accordingly:

birds = Aviary.birds.through.objects.filter(
    aviary=OuterRef('pk')
).values('bird')
aviaries = Aviary.objects.annotate(
    bird_ids=SubqueryArray(birds)
).filter(bird_ids__contains=target_bird_ids)

You can also use __contained_by to go the other way (or __overlap if you just want any match).

All you need then is a suitable SubqueryArray class:

class SubqueryArray(django.db.models.expressions.Subquery):
    template = 'ARRAY(%(subquery)s)'
    output_field = ArrayField(base_field=models.CharField())

You may need to adjust the output field on that, depending upon what your PK field is.