Postgres not using index for date field

2020-04-20 14:18发布

I have created index like

CREATE INDEX bill_open_date_idx ON bill USING btree(date(open_date));

and,

Column      |            Type
open_date   | timestamp without time zone

And explain analyse are as follows

CASE 1

explain analyze select * from bill where open_date >=date('2018-01-01');
Seq Scan on bill  (cost=0.00..345264.60 rows=24813 width=1132) (actual time=0.007..1305.730 rows=5908 loops=1)    
    Filter: (open_date >= '2018-01-01'::date)    
    Rows Removed by Filter: 3238812  
Total runtime: 1306.176 ms

CASE 2

explain analyze select * from bill where open_date>='2018-01-01';
Seq Scan on bill  (cost=0.00..345264.60 rows=24813 width=1132) (actual time=0.006..1220.697 rows=5908 loops=1)    
  Filter: (open_date>= '2018-01-01 00:00:00'::timestamp without time zone)       
  Rows Removed by Filter: 3238812  
Total runtime: 1221.131 ms

CASE 3

explain analyze select * from bill where date(open_date) >='2018-01-01';
Index Scan using idx_bill_open_date on bill  (cost=0.43..11063.18 rows=22747 width=1132) (actual time=0.016..4.744 rows=5908 loops=1)
    Index Cond: (date(open_date) >= '2018-01-01'::date)  
Total runtime: 5.236 ms 
(3 rows)

I did enough research on why this is happening, but there is no proper explanations anywhere. Only case 3 is using the index I have created, but not others. Why is this happening?

As far as my understanding goes, case 2 searches for string equivalent of the column open_date and hence it is not using index. But why not case 1. Also, please correct me if I am wrong.

Thanks in advance!

Edit 1: Also, I'd be delighted to know what is happening in depth.

Following is an excerpt from the gist (https://gist.github.com/cobusc/5875282)

It is strange though that PostgreSQL rewrites the function used to create the index to a canonical form, but does not seem to do the same when the function is used in the WHERE clause (in order to match the index function).

Still, I am unclear why the developers of postgres didn't think of fetching any nearby matching index (Or is my index useless until I cast explicitly to date as in case 3). Considering Postgres is highly evolved and scalable.

1条回答
兄弟一词,经得起流年.
2楼-- · 2020-04-20 14:37

A b-tree index can only be used for a search condition if the condition looks like this:

<indexed expression> <operator> <expression that is constant during the index scan>
  • The <indexed expression> must be the expression you used in the CREATE INDEX statement.

  • The <operator> must belong to the default operator class for the data type and the index access method, or to the operator class specified in CREATE INDEX.

  • The <expression that is constant during the index scan> can be a constant or can contain IMMUTABLE or STABLE functions and operators, but nothing VOLATILE.

All your queries satisfy the last two conditions, but only the third one satisfies the first one. That is why only that query can use the index.

For documentation that covers this in excruciating detail, see the comment for match_clause_to_indexcol in postgresql/src/backend/optimizer/path/indxpath.c:

/*
 * match_clause_to_indexcol()
 *    Determine whether a restriction clause matches a column of an index,
 *    and if so, build an IndexClause node describing the details.
 *
 *    To match an index normally, an operator clause:
 *
 *    (1)  must be in the form (indexkey op const) or (const op indexkey);
 *         and
 *    (2)  must contain an operator which is in the index's operator family
 *         for this column; and
 *    (3)  must match the collation of the index, if collation is relevant.
 *
 *    Our definition of "const" is exceedingly liberal: we allow anything that
 *    doesn't involve a volatile function or a Var of the index's relation.
 *    In particular, Vars belonging to other relations of the query are
 *    accepted here, since a clause of that form can be used in a
 *    parameterized indexscan.  It's the responsibility of higher code levels
 *    to manage restriction and join clauses appropriately.
 *
 *    Note: we do need to check for Vars of the index's relation on the
 *    "const" side of the clause, since clauses like (a.f1 OP (b.f2 OP a.f3))
 *    are not processable by a parameterized indexscan on a.f1, whereas
 *    something like (a.f1 OP (b.f2 OP c.f3)) is.
 *
 *    Presently, the executor can only deal with indexquals that have the
 *    indexkey on the left, so we can only use clauses that have the indexkey
 *    on the right if we can commute the clause to put the key on the left.
 *    We handle that by generating an IndexClause with the correctly-commuted
 *    opclause as a derived indexqual.
 *
 *    If the index has a collation, the clause must have the same collation.
 *    For collation-less indexes, we assume it doesn't matter; this is
 *    necessary for cases like "hstore ? text", wherein hstore's operators
 *    don't care about collation but the clause will get marked with a
 *    collation anyway because of the text argument.  (This logic is
 *    embodied in the macro IndexCollMatchesExprColl.)
 *
 *    It is also possible to match RowCompareExpr clauses to indexes (but
 *    currently, only btree indexes handle this).
 *
 *    It is also possible to match ScalarArrayOpExpr clauses to indexes, when
 *    the clause is of the form "indexkey op ANY (arrayconst)".
 *
 *    For boolean indexes, it is also possible to match the clause directly
 *    to the indexkey; or perhaps the clause is (NOT indexkey).
 *
 *    And, last but not least, some operators and functions can be processed
 *    to derive (typically lossy) indexquals from a clause that isn't in
 *    itself indexable.  If we see that any operand of an OpExpr or FuncExpr
 *    matches the index key, and the function has a planner support function
 *    attached to it, we'll invoke the support function to see if such an
 *    indexqual can be built.
查看更多
登录 后发表回答