Related to - PostgreSQL DISTINCT ON with different ORDER BY
I have table purchases (product_id, purchased_at, address_id)
Sample data:
| id | product_id | purchased_at | address_id |
| 1 | 2 | 20 Mar 2012 21:01 | 1 |
| 2 | 2 | 20 Mar 2012 21:33 | 1 |
| 3 | 2 | 20 Mar 2012 21:39 | 2 |
| 4 | 2 | 20 Mar 2012 21:48 | 2 |
The result I expect is the most recent purchased product (full row) for each address_id and that result must be sorted in descendant order by the purchased_at field:
| id | product_id | purchased_at | address_id |
| 4 | 2 | 20 Mar 2012 21:48 | 2 |
| 2 | 2 | 20 Mar 2012 21:33 | 1 |
Using query:
SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 2
ORDER BY purchases.address_id ASC, purchases.purchased_at DESC
I'm getting:
| id | product_id | purchased_at | address_id |
| 2 | 2 | 20 Mar 2012 21:33 | 1 |
| 4 | 2 | 20 Mar 2012 21:48 | 2 |
So the rows is same, but order is wrong. Any way to fix it?
Quite a clear question :)
And most likely a faster approach:
Your ORDER BY is used by DISTINCT ON for picking which row for each distinct address_id to produce. If you then want to order the resulting records, make the DISTINCT ON a subselect and order its results:
This query is trickier to rephrase properly than it looks.
The currently accepted, join-based answer doesn’t correctly handle the case where two candidate rows have the same given
purchased_at
value: it will return both rows.You can get the right behaviour this way:
Note how it has a fallback of comparing
id
values to disambiguate the case in which thepurchased_at
values match. This ensures that the condition can only ever be true for a single row among those that have the sameaddress_id
value.The original query using
DISTINCT ON
handles this case automatically!Also note the way that you are forced to encode the fact that you want “the latest for each
address_id
” twice, both in thegiven.purchased_at < other.purchased_at
condition and theORDER BY purchased_at DESC
clause, and you have to make sure they match. I had to spend a few extra minutes to convince myself that this query is really positively correct.It’s much easier to write this query correctly and understandbly by using
DISTINCT ON
together with an outer subquery, as suggested by dbenhur.