I have the following query which fetches the id
of the latest N observations
for each station
:
SELECT id
FROM (
SELECT station_id, id, created_at,
row_number() OVER(PARTITION BY station_id
ORDER BY created_at DESC) AS rn
FROM (
SELECT station_id, id, created_at
FROM observations
) s
) s
WHERE rn <= #{n}
ORDER BY station_id, created_at DESC;
I have indexes on id
, station_id
, created_at
.
This is the only solution I have come up with that can fetch more than a single record per station. However it is quite slow (154.0 ms for a table of 81000 records).
How can I speed up the query?
Assuming the current version Postgres 9.3.
Index
First, a multicolumn index will help:
created_at DESC
is a slightly better fit, but the index would still be scanned backwards at almost the same speed withoutDESC
.Assuming
created_at
is definedNOT NULL
, else considerDESC NULLS LAST
in index and query:The last column
id
is only useful if you get an index-only scan out of this, which probably won't work if you add lots of new rows constantly. In this case, removeid
from the index.Simpler query (still slow)
Simplify your query, the inner subselect doesn't help:
Should be a bit faster, but still slow.
Fast query
station_id
id defined asNOT NULL
.To be really fast, you need the equivalent of a loose index scan (not implemented in Postgres). Related answer:
If you have a separate table of
stations
(which seems likely), you can emulate this withJOIN LATERAL
(Postgres 9.3+):If you don't have a table of
stations
, the next best thing would be to create and maintain one. Possibly add a foreign key reference to enforce relational integrity.If that's not an option, you can distill such a table on the fly. Simple options would be:
But those would need a sequential scan and be too slow. Trick Postgres into using above index (or any btree index with
station_id
as leading column) with a recursive CTE:Use that as drop-in replacement for the
stations
table in the above simple query:This should still be faster than what you had by orders of magnitude.
SQL Fiddle.
This is a good anwser only if you are not required to query up-to-date live data.
Preparation (requires postgresql 9.3)
How to query data:
Your original query was 281 ms on my machine and this new was 15 ms.
How to update the view with fresh data:
I have another solution that does not require materialized view and works with live, up-to-date data. But given that you don't need up-to-date data, this materialized view is much more efficient.