Django ORM - Grouped aggregates with different sel

2019-02-15 11:41发布

问题:

Imagine we have the Django ORM model Meetup with the following definition:

class Meetup(models.Model):
    language = models.CharField()
    speaker = models.CharField()
    date = models.DateField(auto_now=True)

I'd like to use a single query to fetch the language, speaker and date for the latest event for each language.

>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
    {'speaker': u'mike', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'ryan', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'noah', 'language': u'node', 'latest_date': ...}, 
    {'speaker': u'shawn', 'language': u'node', 'latest_date': ...}, 
]

D'oh! We're getting the latest event, but for the wrong grouping!

It seems like I need a way to GROUP BY the language but SELECT on a different set of fields?


Update - this sort of query seems fairly easy to express in SQL:

SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;

I'd love a way to do this without using Django's raw() - is it possible?

Update 2 - after much searching, it seems there are similar questions on SO:

  • Django Query that gets the most recent objects
  • How can I do a greatest n per group query in Django
  • MySQL calls this sort of query a group-wise maximum of a certain column.

Update 3 - in the end, with @danihp's help, it seems the best you can do is two queries. I've used the following approach:

# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
                                   .annotate(latest_pk=Max("pk"))
                                   .values_list("latest_pk", flat=True))

# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)

This question is a follow up to my previous question:

Django ORM - Get latest record for group

回答1:

This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )

But this is not SQL, this is django query api. Easy way is to do a query for each language:

languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [  Meetup
                     .objects
                     .filter( language = l )
                     .latest( 'date' )
                     for l in languages
                    ]

This crash if some language don't has meetings. The other approach is to get all max data for each language:

last_dates = ( Meetup
             .objects
             .values("language")
             .annotate(ldate=models.Max("date"))
             .order_by() )

q= reduce(lambda q,meetup: 
     q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ), 
     last_dates, Q())  

your_query = Meetup.objects.filter(q)

Perhaps someone can explain how to do it in a single query without raw sql.

Edited due OP comment

You are looking for:

"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"

Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker is on select clause (without aggregated function) but not appear in group by.

In mysql they are not guaranties than showed result speaker was that match with max date. Because this, we are not facing a easy query.

Quoting MySQL docs:

In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause...However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.

The most close query to match your requirements is:

Reults = (   Meetup
             .objects
             .values("language","speaker")
             .annotate(ldate=models.Max("date"))
             .order_by() )