How to use subquery in django?

2019-03-17 21:28发布

问题:

I want to get a list of the latest purchase of each customer, sorted by the date.

The following query does what I want except for the date:

(Purchase.objects
         .all()
         .distinct('customer')
         .order_by('customer', '-date'))

It produces a query like:

SELECT DISTINCT ON 
    "shop_purchase.customer_id" 
    "shop_purchase.id" 
    "shop_purchase.date" 
FROM "shop_purchase" 
ORDER BY "shop_purchase.customer_id" ASC, 
         "shop_purchase.date" DESC;

I am forced to use customer_id as the first ORDER BY expression because of DISTINCT ON.

I want to sort by the date, so what the query I really need should look like this:

SELECT * FROM (
  SELECT DISTINCT ON 
      "shop_purchase.customer_id" 
      "shop_purchase.id" 
      "shop_purchase.date" 
  FROM "shop_purchase" 
  ORDER BY "shop_purchase.customer_id" ASC, 
           "shop_purchase.date" DESC;
  )
AS result 
ORDER BY date DESC;

I don't want to sort using python because I still got to page limit the query. There can be tens of thousands of rows in the database.

In fact it is currently sorted by in python now and is causing very long page load times, so that's why I'm trying to fix this.

Basically I want something like this https://stackoverflow.com/a/9796104/242969. Is it possible to express it with django querysets instead of writing raw SQL?

The actual models and methods are several pages long, but here is the set of models required for the queryset above.

class Customer(models.Model):
  user = models.OneToOneField(User)

class Purchase(models.Model):
  customer = models.ForeignKey(Customer)
  date = models.DateField(auto_now_add=True)
  item = models.CharField(max_length=255)

If I have data like:

Customer A - 
    Purchase(item=Chair, date=January), 
    Purchase(item=Table, date=February)
Customer B - 
    Purchase(item=Speakers, date=January), 
    Purchase(item=Monitor,  date=May)
Customer C - 
    Purchase(item=Laptop,  date=March), 
    Purchase(item=Printer, date=April)

I want to be able to extract the following:

Purchase(item=Monitor, date=May)
Purchase(item=Printer, date=April)
Purchase(item=Table,   date=February)

There is at most one purchase in the list per customer. The purchase is each customer's latest. It is sorted by latest date.

This query will be able to extract that:

SELECT * FROM (
  SELECT DISTINCT ON 
    "shop_purchase.customer_id" 
    "shop_purchase.id" 
    "shop_purchase.date" 
  FROM "shop_purchase" 
  ORDER BY "shop_purchase.customer_id" ASC, 
           "shop_purchase.date" DESC;
) 
AS result 
ORDER BY date DESC;

I'm trying to find a way not to have to use raw SQL to achieve this result.

回答1:

This may not be exactly what you're looking for, but it might get you closer. Take a look at Django's annotate.

Here is an example of something that may help:

  from django.db.models import Max
  Customer.objects.all().annotate(most_recent_purchase=Max('purchase__date'))

This will give you a list of your customer models each one of which will have a new attribute called "most_recent_purchase" and will contain the date on which they made their last purchase. The sql produced looks like this:

SELECT "demo_customer"."id", 
       "demo_customer"."user_id", 
       MAX("demo_purchase"."date") AS "most_recent_purchase"
FROM "demo_customer"
LEFT OUTER JOIN "demo_purchase" ON ("demo_customer"."id" = "demo_purchase"."customer_id")
GROUP BY "demo_customer"."id",
         "demo_customer"."user_id"

Another option, would be adding a property to your customer model that would look something like this:

  @property
  def latest_purchase(self):
    return self.purchase_set.order_by('-date')[0]

You would obviously need to handle the case where there aren't any purchases in this property, and this would potentially not perform very well (since you would be running one query for each customer to get their latest purchase).

I've used both of these techniques in the past and they've both worked fine in different situations. I hope this helps. Best of luck!



回答2:

Whenever there is a difficult query to write using Django ORM, I first try the query in psql(or whatever client you use). The SQL that you want is not this:

SELECT * FROM (
  SELECT DISTINCT ON 
    "shop_purchase.customer_id" "shop_purchase.id" "shop_purchase.date" 
  FROM "shop_purchase" 
  ORDER BY "shop_purchase.customer_id" ASC, "shop_purchase.date" DESC;
  ) AS result 
ORDER BY date DESC;

In the above SQL, the inner SQL is looking for distinct on a combination of (customer_id, id, and date) and since id will be unique for all, you will get all records from the table. I am assuming id is the primary key as per convention.

If you need to find the last purchase of every customer, you need to do something like:

SELECT  "shop_purchase.customer_id", max("shop_purchase.date")
FROM shop_purchase
GROUP BY 1 

But the problem with the above query is that it will give you only the customer name and date. Using that will not help you in finding the records when you use these results in a subquery.

To use IN you need a list of unique parameters to identify a record, e.g., id

If in your records id is a serial key, then you can leverage the fact that the latest date will be the maximum id as well. So your SQL becomes:

SELECT  max("shop_purchase.id") 
FROM shop_purchase
GROUP BY "shop_purchase.customer_id";

Note that I kept only one field (id) in the selected clause to use it in a subquery using IN.

The complete SQL will now be:

SELECT * 
FROM shop_customer 
WHERE "shop_customer.id" IN 
    (SELECT  max("shop_purchase.id") 
     FROM shop_purchase
     GROUP BY "shop_purchase.customer_id");

and using the Django ORM it looks like:

(Purchase.objects.filter(
    id__in=Purchase.objects
                   .values('customer_id')
                   .annotate(latest=Max('id'))
                   .values_list('latest', flat=True)))

Hope it helps!



回答3:

I have a similar situation and this is how I'm planning to go about it:

query = Purchase.objects.distinct('customer').order_by('customer').query
query = 'SELECT * FROM ({}) AS result ORDER BY sent DESC'.format(query)
return Purchase.objects.raw(query)

Upside it gives me the query I want. Downside is that it is raw query and I can't append any other queryset filters.



回答4:

This is my approach if I need some subset of data (N items) along with the Django query. This is example using PostgreSQL and handy json_build_object() function (Postgres 9.4+), but same way you can use other aggregate function in other database system. For older PostgreSQL versions you can use combination of array_agg() and array_to_string() functions.

Imagine you have Article and Comment models and along with every article in the list you want to select 3 recent comments (change LIMIT 3 to adjust size of subset or ORDER BY c.id DESC to change sorting of subset).

qs = Article.objects.all()
qs = qs.extra(select = {
    'recent_comments': """
    SELECT
        json_build_object('comments',
            array_agg(
              json_build_object('id', id, 'user_id', user_id, 'body', body)
            )
        )
    FROM (
        SELECT
          c.id,
          c.user_id,
          c.body
        FROM app_comment c
        WHERE c.article_id = app_article.id
        ORDER BY c.id DESC
        LIMIT 3
    ) sub
    """
})

for article in qs:
    print(article.recent_comments)

# Output:
# {u'comments': [{u'user_id': 1, u'id': 3, u'body': u'foo'}, {u'user_id': 1, u'id': 2, u'body': u'bar'}, {u'user_id': 1, u'id': 1, u'body': u'joe'}]}
# ....