The index:
CREATE INDEX message_index ON message(queue_id, target_client_id, timestamp ASC, source_client_id);
The query:
EXPLAIN ANALYZE
SELECT content
FROM message
WHERE message.queue_id = 1
AND message.source_client_id = 1
AND (message.target_client_id = -1 OR message.target_client_id = 1);
Output:
Bitmap Heap Scan on message (cost=8.87..12.89 rows=1 width=13) (actual time=0.022..0.026 rows=50 loops=1)
Recheck Cond: (((queue_id = 1) AND (target_client_id = (-1)) AND (source_client_id = 1)) OR ((queue_id = 1) AND (target_client_id = 1) AND (source_client
_id = 1)))
-> BitmapOr (cost=8.87..8.87 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1)
-> Bitmap Index Scan on message_index (cost=0.00..4.43 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=1)
Index Cond: ((queue_id = 1) AND (target_client_id = (-1)) AND (source_client_id = 1))
-> Bitmap Index Scan on message_index (cost=0.00..4.44 rows=1 width=0) (actual time=0.006..0.006 rows=50 loops=1)
Index Cond: ((queue_id = 1) AND (target_client_id = 1) AND (source_client_id = 1))
How can this query use the index with regards to source_client_id
i.e. the rightmost column in the composite index without the third column (timestamp
) being involved in the query at all?
According to the last answer here How important is the order of columns in indexes? this should not be valid. What am I missing?
Postgres can use other columns than the leading one for an index lookup - this is just not as efficient as using the leftmost column. Postgres will scan the entire index in that case (instead of the table). Whereas for conditions on the left most column(s) Postgres will only retrieve those rows from the index that match the condition. The difference in efficiency is therefore the number of index entries that are processed.
I think this is somehow hidden behind the following sentence from the manual:
Where "these columns" refers to the left most columns.
Btw: the reason why
timestamp
(ordate
ornumber
) is such a bad choice for a column is that it's a reserved word. But more importantly: the name doesn't document anything. Someone not familiar with the data model has no clue what you are storing there. The time the message was sent? The time the message was received? The time the message was last updated?