I have a very standard, basic social application -- with status updates (i.e., posts), and multiple comments per post.
Given the following simplified models, is it possible, using Django's ORM, to efficiently retrieve all posts and the latest two comments associated with each post, without performing N+1 queries? (That is, without performing a separate query to get the latest comments for each post on the page.)
class Post(models.Model):
title = models.CharField(max_length=255)
text = models.TextField()
class Comment(models.Model):
text = models.TextField()
post = models.ForeignKey(Post, related_name='comments')
class Meta:
ordering = ['-pk']
Post.objects.prefetch_related('comments').all()
fetches all posts and comments, but I'd like to retrieve a limited number of comments per post only.
UPDATE:
I understand that, if this can be done at all using Django's ORM, it probably must be done with some version of prefetch_related
. Multiple queries are totally okay, as long as I avoid making N+1 queries per page.
What is the typical/recommended way of handling this problem in Django?
UPDATE 2:
There seems to be no direct and easy way to do this efficiently with a simple query using the Django ORM. There are a number of helpful solutions/approaches/workarounds in the answers below, including:
- Caching the latest comment IDs in the database
- Performing a raw SQL query
- Retrieving all comment IDs and doing the grouping and "joining" in python
- Limiting your application to displaying the latest comment only
I didn't know which one to mark as correct because I haven't gotten a chance to experiment with all of these methods yet -- but I awarded the bounty to hynekcer for presenting a number of options.
UPDATE 3:
I ended up using @user1583799's solution.
This solution is optimized for memory requirements, as you expect it important. It needs three queries. The first query asks for posts, the second query only for tuples (id, post_id). The third for details of filtered latest comments.
But I think it will be faster for many database backends to combine the second and the third query into one and so to ask immediately for all fields of comments. Unuseful comments will be forgotten immediately.
The fastest solution would be with nested queries. The algorithm is like the one above, but everything is realized by raw SQL. It is limited only to some backends like PostgresQL.
EDIT
I agree that is not useful for you
and therefore I wrote that relatively complicated solution that 99% of them will be read continuously without loading into memory.
EDIT
A) Nested query for PostgresQL
Or explicitely require with less memory if we don't believe the optimizer. It should read data only from index in two inner selects, which is much less data than from the table.:
B) With a cached ID of the oldest displayed comment
Add field "oldest_displayed" to Post
class Post(models.Model):
oldest_displayed = models.IntegerField()
Filter comments for pk if interesting posts (that you have selected earlier by categories etc.)
Filter
Hmm, very nice ... and how it is compiled by Django?
Prepare all "oldest_displayed" by one nested SQL initially (and set zero for posts with less than two comments):
If you're using Django 1.7 the new
Prefetch
objects—allowing you to customize the prefetch queryset—could prove helpful.Unfortunately I can't think of a simple way to do exactly what you're asking. If you're on PostgreSQL and are willing to get just the latest comment for each post, the following should work in two queries:
Another variation: if your comments had a timestamp and you wanted to limit the comments to the most recent ones by date, that would look something like:
...and then as above. Of course, you could still post-process the resulting list to limit the display to a maximum of two comments.
prefetch_related('comments')
will fetch all comments of the posts.I had the same problem, and the database is Postgresql. I found a way:
Add a extra field
related_replies
. Note the FieldType isArrayField
, which support in django1.8dev. I copy the code to my project(the version of django is 1.7), just change 2 lines, it works.(or use djorm-pg-array )class Post(models.Model): related_replies = ArrayField(models.IntegerField(), size=10, null=True)
And use two queries:
When new comment comes, update
related_replies
.