I'm using a transaction model to keep track all the events going through the system
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
......
how do I get the top 5 actors in my system?
In sql it will basically be
SELECT actor, COUNT(*) as total
FROM Transaction
GROUP BY actor
ORDER BY total DESC
According to the documentation, you should use:
from django.db.models import Count
Transaction.objects.all().values('actor').annotate(total=Count('actor')).order_by('total')
values() : specifies which columns are going to be used to "group by"
Django docs:
"When a values() clause is used to constrain the columns that are
returned in the result set, the method for evaluating annotations is
slightly different. Instead of returning an annotated result for each
result in the original QuerySet, the original results are grouped
according to the unique combinations of the fields specified in the
values() clause"
annotate() : specifies an operation over the grouped values
Django docs:
The second way to generate summary values is to generate an independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The order by clause is self explanatory.
To summarize: you group by, generating a queryset of authors, add the annotation (this will add an extra field to the returned values) and finally, you order them by this value
Refer to https://docs.djangoproject.com/en/dev/topics/db/aggregation/ for more insight
Just like @Alvaro has answered the Django's direct equivalent for GROUP BY
statement:
SELECT actor, COUNT(*) AS total
FROM Transaction
GROUP BY actor
is through the use of values()
and annotate()
methods as follows:
Transaction.objects.values('actor').annotate(total=Count('actor')).order_by()
However one more thing must be pointed out:
If the model has a default ordering defined in class Meta
, the .order_by()
clause is obligatory for proper results. You just cannot skip it even when no ordering is intended.
Further, for a high quality code it is advised to always put a .order_by()
clause after annotate()
, even when there is no class Meta: ordering
. Such approach will make the statement future-proof: it will work just as intended, regardless of any future changes to class Meta: ordering
.
Let me provide you with an example. If the model had:
class Transaction(models.Model):
actor = models.ForeignKey(User, related_name="actor")
acted = models.ForeignKey(User, related_name="acted", null=True, blank=True)
action_id = models.IntegerField()
class Meta:
ordering = ['id']
Then such approach WOULDN'T work:
Transaction.objects.values('actor').annotate(total=Count('actor'))
That's because Django performs additional GROUP BY
on every field in class Meta: ordering
If you would print the query:
>>> print Transaction.objects.values('actor').annotate(total=Count('actor')).query
SELECT "Transaction"."actor_id", COUNT("Transaction"."actor_id") AS "total"
FROM "Transaction"
GROUP BY "Transaction"."actor_id", "Transaction"."id"
It will be clear that the aggregation would NOT work as intended and therefore the .order_by()
clause must be used to clear this behaviour and get proper aggregation results.
See: Interaction with default ordering or order_by() in official Django documentation.