In a Django subquery, can I reference the “parent”

2019-05-21 22:08发布

It's simple to create subqueries in Django ORM (just use a QuerySet as part of another query), but is it possible for that subquery to reference fields in the "parent" (outer, main) query?

For a full example of what I'm trying to achieve, see this working SQL Fiddle. I broke it down into two questions (other one here). In this case, I have a model Whole that represents a value that must be reached. Several Parts contribute to it with a (calculated) value of their own. I want to retrieve all the Wholes which has not been completed yet (i.e. the total_value is different from the sum of individual values).

select w.*
  from whole w
  where w.total_value != (
    select sum(value expression)
      from part p
      where p.whole_id = w.id
      group by p.whole_id
  );

I don't know how (or if it's even possible) to do this using Django ORM. I've seen many examples of subqueries using __in (and could confirm by print qs.query that the result is indeed ran as a single query), but only when both queries are independent of each other. Here, the subquery is constrained by a field in the parent query (w.id). I thought of using F(), Q(), or even extra, but can't quite figure out what to do...

Here's a SSCCE, in case anyone want to experiment with it: Download or Browse. It has the same models and data as the SQL fiddle linked above.


Update: for my particular case, I found out there's no need to do a subquery, I can just use group by and having (as this SQL Fiddle shows):

q = Q(part__isnull=True) | ~Q(partial=F('total_value'))
qs = Whole.objects.annotate(partial=Sum(...)).filter(q).distinct()

# And if total_value can be zero:
qs = qs.exclude(part__isnull=True, total_value=0)

The general case for subqueries is still unsolved though (short of using some raw SQL, as my answer below shows).

1条回答
爷、活的狠高调
2楼-- · 2019-05-21 22:32

The solution I devised with the least raw SQL uses extra and where:

  • First create the inner query; use extra to specify a custom where component, comparing the restricted field to the one in the outer query, as it will appear there (might need to hardcode the table name/alias):

    qs1 = Part.objects.extra(where=['whole_id = "applabel_whole"."id"'])...
    

    Then make the remaining operations on it (in this case, using values and annotate for grouping, aggregation and return of a single field).

  • Then include the generated SQL of the inner query (using .query) in the outer query, also using extra and where:

    qs = Whole.objects.extra(where=['total_value != ({})'.format(qs1.query)])
    

The code fragment in the extra calls might not be portable (ex.: some backends use !=, others use <>, the correct way of quoting table names might vary, etc), but the rest of the inner query shall be (since it was generated by the ORM).

The resulting query corresponds to what I'm looking for (except for the aggregation part, which is covered in the other question). SQL formatted for readability:

>>> qs1 = Part.objects.extra(
        where=['whole_id = "aggregation_subquery_whole"."id"']
    ).values('whole_id').annotate(sum=Sum('before__value')).values('sum')

>>> qs = Whole.objects.extra(where=['total_value != ({})'.format(qs1.query)])

>>> print qs.query

SELECT "aggregation_subquery_whole"."id",
       "aggregation_subquery_whole"."total_value" 
FROM "aggregation_subquery_whole"
WHERE total_value != (
    SELECT SUM("aggregation_subquery_sequence"."value") AS "sum"
    FROM "aggregation_subquery_part"
        LEFT OUTER JOIN "aggregation_subquery_sequence" ON
           ("aggregation_subquery_part"."before_id" =
            "aggregation_subquery_sequence"."id") 
    WHERE whole_id = "aggregation_subquery_whole"."id"
    GROUP BY "aggregation_subquery_part"."whole_id"
)
查看更多
登录 后发表回答