How Bitmap Heap Scan and Index Scan is decided?

2019-04-29 06:53发布

问题:

I'm testing different queries and I'm curious about how db decide using Bitmap Heap Scan and Index Scan.

create index customers_email_idx on customers(email varchar_pattern_ops);

As you can see there is a customers table (dellstore example) and I add an index to email column.

First query is here:

select * from customers where email like 'ITQ%@dell.com'; -> query with Index Scan

Explain analyze query is here:

                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using customers_email_idx on customers  (cost=0.00..8.27 rows=2 width=268) (actual time=0.046..0.046 rows=0 loops=1)
   Index Cond: (((email)::text ~>=~ 'ITQ'::text) AND ((email)::text ~<~ 'ITR'::text))
   Filter: ((email)::text ~~ 'ITQ%@dell.com
 '::text)
 Total runtime: 0.113 ms

Other query is here:

select * from customers where email like 'IT%@dell.com'; -> query with Bitmap Heap Scan

Explain analyze query is here:

                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customers  (cost=4.54..106.77 rows=2 width=268) (actual time=0.206..0.206 rows=0 loops=1)
   Filter: ((email)::text ~~ 'IT%@dell.com
 '::text)
   ->  Bitmap Index Scan on customers_email_idx  (cost=0.00..4.54 rows=29 width=0) (actual time=0.084..0.084 rows=28 loops=1)
         Index Cond: (((email)::text ~>=~ 'IT'::text) AND ((email)::text ~<~ 'IU'::text))
 Total runtime: 0.273 ms

Can you explain this example why Bitmap and Index Scan is used here?

Thank you..

回答1:

How many rows do you have in total in the table? The decision is based on what proportion of the rows will be output by the index scan.

If a sufficiently high proportion of the table is going to be accessed, a bitmap index scan is used to ensure that as much of the disk access as possible is sequential. By contrast, a plain index scan does one-page-at-a-time random access to the table data. (And if the proportion of the table projected to be accessed is high enough, the index isn't used at all, and the entire table data loaded sequentially)

One issue is that the projection of how many rows from the table are going to be accessed is just an estimate. But as you can imagine, 'IT%' is likely to match more than 'ITQ%' (remember that the suffix isn't part of the index scan, only the final filter)