I have a table bsort
:
CREATE TABLE bsort(a int, data text);
Here data
may be incomplete. In other words, some tuples may not have data
value.
And then I build a b-tree index on the table:
CREATE INDEX ON bsort USING BTREE(a);
Now if I perform this query:
SELECT * FROM bsort ORDER BY a;
Does PostgreSQL sort tuples with nlogn complexity, or does it get the order directly from the b-tree index?
You would need to check the execution plan. But, Postgres is quite capable of using the index to make the
order by
more efficient. It would read the records directly from the index. Because you have only one column, there is no need to access the data pages.For a simple query like this Postgres will use an index scan and retrieve readily sorted tuples from the index in order. Due to its MVCC model Postgres had to always visit the "heap" (data pages) additionally to verify entries are actually visible to the current transaction. Quoting the Postgres Wiki on index-only scans:
Which finally happened in version 9.2: index-only scans. The manual:
So now it depends on the visibility map of the table, whether index-only scans are possible. Only an option if all involved columns are included in the index. Else, the heap has to be visited (additionally) in any case. The sort step is still not needed.
That's why we sometimes append otherwise useless columns to indexes now. Like the
data
column in your example:It makes the index bigger (depends) and a bit more expensive to maintain and use for other purposes that do not allow an index-only scan. So only append the
data
column if you get index-only scans out of it. The order of columns in the index is important:The benefit of an index-only scan, per documentation:
The visibility map is maintained by
VACUUM
which happens automatically if you have autovacuum running (the default setting in modern Postgres). Details:But there is some delay between write operations to the table and the next
VACUUM
run. The gist of it:VACUUM
(and all older tansactions being finished), so it depends on the ratio between write operations andVACUUM
frequency.Partial index-only scans are still possible if some of the involved pages are marked all-visible. But if the heap has to be visited anyway, the access method "index scan" is a bit cheaper. So if too many pages are currently dirty, Postgres will switch to the cheaper index scan altogether. The Postgres Wiki again: