So here are my models:
class Event(models.Model):
user = models.ForeignKey(User, blank=True, null=True, db_index=True)
name = models.CharField(max_length = 200, db_index=True)
platform = models.CharField(choices = (("ios", "ios"), ("android", "android")), max_length=50)
class User(AbstractUser):
email = models.CharField(max_length=50, null=False, blank=False, unique=True)
Event
is like an analytics event, so it's very possible that I could have multiple events for one user, some with platform=ios
and some with platform=android
, if a user has logged in on multiple devices. I want to query to see how many users have both ios and android devices. So I wrote a query like this:
User.objects.filter(Q(event__platform="ios") & Q(event__platform="android")).count()
Which returns 0 results. I know this isn't correct. I then thought I would try to just query for iOS users:
User.objects.filter(Q(event__platform="ios")).count()
Which returned 6,717,622 results, which is unexpected because I only have 39,294 users. I'm guessing it's not counting the Users, but counting the Event
instances, which seems like incorrect behavior to me. Does anyone have any insights into this problem?
You can use annotations instead:
So it will filter out any user that has two events.
You can also use chained filters:
Which first filter will get all users with android platform and the second one will get the users that also have iOS platform.
This is generally an answer for a queryset with two or more conditions related to children objects.
Solution: A simple solution with two subqueries is possible, even without any join:
The method
.order_by()
is important if the model Event has a default ordering (see it in the docs about distinct() method).Notes:
Verify the only SQL request that will be executed: (Simplified by removing "app_" prefix.)
Q()
is used because the same condition parameter (pk__in
) can not be repeated in the samefilter()
, but also chained filters could be used instead:.filter(...).filter(...)
. (The order of filter conditions is not important and it is outweighed by preferences estimated by SQL server optimizer.)base_subq
is an "alias" queryset only to don't repeat the same part of expression that is never evaluated individually.Another solution with two nested subqueries This non symmetric solution can be faster if we know that one subquery (that we put innermost) has a much more restrictive filter than another necessary subquery with a huge set of results. (example if a number of Android users would be huge)
Verify how it is compiled to SQL: (simplified again by removing
app_
prefix and"
.)(These solutions work also in an old Django e.g. 1.8. A special subquery function
Subquery()
exists since Django 1.11 for more complicated cases, but we didn't need it for this simple question.)