Composite index. - leftmost principle violated?

2019-08-17 03:35发布

The index:

CREATE INDEX message_index ON message(queue_id, target_client_id, timestamp ASC, source_client_id);

The query:

EXPLAIN ANALYZE 
SELECT content 
  FROM message 
WHERE message.queue_id = 1 
  AND message.source_client_id = 1 
  AND (message.target_client_id = -1 OR message.target_client_id = 1);

Output:

 Bitmap Heap Scan on message  (cost=8.87..12.89 rows=1 width=13) (actual time=0.022..0.026 rows=50 loops=1)
   Recheck Cond: (((queue_id = 1) AND (target_client_id = (-1)) AND (source_client_id = 1)) OR ((queue_id = 1) AND (target_client_id = 1) AND (source_client
_id = 1)))
   ->  BitmapOr  (cost=8.87..8.87 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1)
         ->  Bitmap Index Scan on message_index  (cost=0.00..4.43 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=1)
               Index Cond: ((queue_id = 1) AND (target_client_id = (-1)) AND (source_client_id = 1))
         ->  Bitmap Index Scan on message_index  (cost=0.00..4.44 rows=1 width=0) (actual time=0.006..0.006 rows=50 loops=1)
               Index Cond: ((queue_id = 1) AND (target_client_id = 1) AND (source_client_id = 1))

How can this query use the index with regards to source_client_id i.e. the rightmost column in the composite index without the third column (timestamp) being involved in the query at all?

According to the last answer here How important is the order of columns in indexes? this should not be valid. What am I missing?

1条回答
放荡不羁爱自由
2楼-- · 2019-08-17 04:14

Postgres can use other columns than the leading one for an index lookup - this is just not as efficient as using the leftmost column. Postgres will scan the entire index in that case (instead of the table). Whereas for conditions on the left most column(s) Postgres will only retrieve those rows from the index that match the condition. The difference in efficiency is therefore the number of index entries that are processed.

I think this is somehow hidden behind the following sentence from the manual:

Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned.

Where "these columns" refers to the left most columns.


Btw: the reason why timestamp (or date or number) is such a bad choice for a column is that it's a reserved word. But more importantly: the name doesn't document anything. Someone not familiar with the data model has no clue what you are storing there. The time the message was sent? The time the message was received? The time the message was last updated?

查看更多
登录 后发表回答