PostgreSQL daterange not using index correctly

2020-04-10 08:54发布

I have a simple table which has a user_birthday field with a type of date (which can be NULL value)

CREATE TABLE users
(
  user_id bigserial NOT NULL,
  user_email text NOT NULL,
  user_password text,
  user_first_name text NOT NULL,
  user_middle_name text,
  user_last_name text NOT NULL,
  user_birthday date,
  CONSTRAINT pk_users PRIMARY KEY (user_id)
)

There's an index (btree) defined on that field, with the rule of NOT user_birthday IS NULL.

CREATE INDEX ix_users_birthday
  ON users
  USING btree
  (user_birthday)
  WHERE NOT user_birthday IS NULL;

Trying to follow up on another idea, I've added the extension btree_gist and created the following index:

CREATE INDEX ix_users_birthday_gist
  ON glances.users
  USING gist
  (user_birthday)
  WHERE NOT user_birthday IS NULL;

But it had no affect either, as from what I could read it is not used for range checking.

The PostgreSQL version is 9.3.4.0 (22) Postgres.app and issue also exists in 9.3.3.0 (21) Postgres.app

I've been intrigued by the following queries:

Query #1:

EXPLAIN ANALYZE SELECT *
FROM users
WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01)')

Query #2:

EXPLAIN ANALYZE SELECT *
FROM users
WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date

which, at first glance both should have the same execution plan, but for some reason, here are the results:

Query #1:

"Seq Scan on users  (cost=0.00..52314.25 rows=11101 width=241) (actual
time=0.014..478.983 rows=208886 loops=1)"
"  Filter: (user_birthday <@ '[1978-07-15,1983-03-01)'::daterange)"
"  Rows Removed by Filter: 901214"
"Total runtime: 489.584 ms"

Query #2:

"Bitmap Heap Scan on users  (cost=4468.01..46060.53 rows=210301 width=241)
(actual time=57.104..489.785 rows=209019 loops=1)"
"  Recheck Cond: ((user_birthday >= '1978-07-15'::date) AND (user_birthday
<= '1983-03-01'::date))"
"  Rows Removed by Index Recheck: 611375"
"  ->  Bitmap Index Scan on ix_users_birthday  (cost=0.00..4415.44
rows=210301 width=0) (actual time=54.621..54.621 rows=209019 loops=1)"
"        Index Cond: ((user_birthday >= '1978-07-15'::date) AND
(user_birthday <= '1983-03-01'::date))"
"Total runtime: 500.983 ms"

As you can see, the <@ daterange is not utilizing the existing index, while BETWEEN does.

Important to note that the actual use case for this rule is in a more complex query, which doesn't result in the Recheck Cond and Bitmap Heap scan. In the application complex query, the difference between the two methods (with 1.2 million records) is massive: Query #1 at 415ms Query #2 at 84ms.

Is this a bug with daterange? Am I doing something wrong? or datarange <@ is performing as designed?

There's also a discussion in the pgsql-bugs mailing list

1条回答
放我归山
2楼-- · 2020-04-10 09:23

BETWEEN includes upper and lower border. Your condition

WHERE user_birthday BETWEEN '1978-07-15'::date AND '1983-03-01'::date

matches

WHERE user_birthday <@ daterange('[1978-07-15,1983-03-01]')

I see you mention a btree index. For that use simple comparison operators.

Detailed manual page on which index is good for which operators.

The range type operators <@ or @> would work with GiST indexes.
Example:
Perform this hours of operation query in PostgreSQL

查看更多
登录 后发表回答