I am trying to make a very simple Subquery
that uses OuterRef
(not for practical purposes, just to get it working), but keep running into same error.
posts/models.py
code
from django.db import models
class Tag(models.Model):
name = models.CharField(max_length=120)
def __str__(self):
return self.name
class Post(models.Model):
title = models.CharField(max_length=120)
tags = models.ManyToManyField(Tag)
def __str__(self):
return self.title
manage.py shell
code
>>> from django.db.models import OuterRef, Subquery
>>> from posts.models import Tag, Post
>>> tag1 = Tag.objects.create(name='tag1')
>>> post1 = Post.objects.create(title='post1')
>>> post1.tags.add(tag1)
>>> Tag.objects.filter(post=post1.pk)
<QuerySet [<Tag: tag1>]>
>>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
>>> Post.objects.annotate(count=Subquery(tags_list.count()))
The last two lines should give me number of tags for each Post object. And here I keep getting the same error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
One of the problems with your example is that you cannot use queryset.count()
as a subquery, because .count()
tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count()
instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This wont work for two reasons:
The Tag
queryset selects all Tag
fields, while Count
can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk')
is needed (to select counting on tag.pk
).
Count
itself is not a Subquery
class, Count
is an Aggregate
. So the expression generated by Count
is not recognized as a Subquery
(OuterRef
requires subquery), we can fix that by using Subquery
.
Apply-ing fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However
if you inspect the query being produced
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
You may notice that we have a GROUP BY
clause. This is because Count is an Aggregate, right now it does not affect the result, but in some other cases it may. Thats why the docs suggest a little bit different approach, where the aggregation is moved into the subquery
via a specific combination of values
+ annotate
+ values
Post.objects.annotate(
count=Subquery(
Tag.objects.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"
The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils
and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.