Postgres combining multiple Indexes

2020-05-28 04:34发布

问题:

I have the following table/indexes -

CREATE TABLE test
(
   coords geography(Point,4326), 
   user_id varchar(50), 
   created_at timestamp
);
CREATE INDEX ix_coords ON test USING GIST (coords);
CREATE INDEX ix_user_id ON test (user_id);
CREATE INDEX ix_created_at ON test (created_at DESC);

This is the query I want to execute:

select * 
from updates 
where ST_DWithin(coords, ST_MakePoint(-126.4, 45.32)::geography, 30000) 
and user_id='3212312' 
order by created_at desc
limit 60

When I run the query it only uses ix_coords index. How can I ensure that Postgres uses ix_user_id and ix_created_at index as well for the query?

This is a new table in which I did bulk insert of production data. Total rows in the test table: 15,069,489

I am running PostgreSQL 9.2.1 (with Postgis) with (effective_cache_size = 2GB). This is my local OSX with 16GB RAM, Core i7/2.5 GHz, non-SSD disk.

Adding the EXPLAIN ANALYZE output -

Limit  (cost=71.64..71.65 rows=1 width=280) (actual time=1278.652..1278.665 rows=60 loops=1)
   ->  Sort  (cost=71.64..71.65 rows=1 width=280) (actual time=1278.651..1278.662 rows=60 loops=1)
         Sort Key: created_at
         Sort Method: top-N heapsort  Memory: 33kB
         ->  Index Scan using ix_coords on test  (cost=0.00..71.63 rows=1 width=280) (actual time=0.198..1278.227 rows=178 loops=1)
               Index Cond: (coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography)
               Filter: (((user_id)::text = '4f1092000b921a000100015c'::text) AND ('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
               Rows Removed by Filter: 3122459
 Total runtime: 1278.701 ms

UPDATE:

Based on the suggestions below I tried index on cords + user_id:

CREATE INDEX ix_coords_and_user_id ON updates USING GIST (coords, user_id);

..but get the following error:

ERROR:  data type character varying has no default operator class for access method "gist"
HINT:  You must specify an operator class for the index or define a default operator class for the data type.

UPDATE:

So the CREATE EXTENSION btree_gist; solved the btree/gist compound index issue. And now my index looks like

CREATE INDEX ix_coords_user_id_created_at ON test USING GIST (coords, user_id, created_at);

NOTE: btree_gist does not accept DESC/ASC.

New query plan:

Limit  (cost=134.99..135.00 rows=1 width=280) (actual time=273.282..273.292 rows=60 loops=1)
   ->  Sort  (cost=134.99..135.00 rows=1 width=280) (actual time=273.281..273.285 rows=60 loops=1)
         Sort Key: created_at
         Sort Method: quicksort  Memory: 41kB
         ->  Index Scan using ix_updates_coords_user_id_created_at on updates  (cost=0.00..134.98 rows=1 width=280) (actual time=0.406..273.110 rows=115 loops=1)
               Index Cond: ((coords && '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography) AND ((user_id)::text = '4e952bb5b9a77200010019ad'::text))
               Filter: (('0101000020E61000006666666666E63C40C3F5285C8F824440'::geography && _st_expand(coords, 30000::double precision)) AND _st_dwithin(coords, '0101000020E61000006666666666E63C40C3F5285C8F824440'::geography, 30000::double precision, true))
               Rows Removed by Filter: 1
 Total runtime: 273.331 ms

The query is performing better than before, almost a second better but still not great. I guess this is the best that I can get?? I was hoping somewhere around 60-80ms. Also taking order by created_at desc from the query, shaves off another 100ms, meaning it is unable to use the index. Anyway to fix this?

回答1:

I don't know if Pg can combine a GiST index and regular b-tree indexes with a bitmap index scan, but I suspect not. You may be getting the best result you can without adding a user_id column to your GiST index (and consequently making it bigger and slower for other queries that don't use user_id).

As an experiment you could:

CREATE EXTENSION btree_gist;
CREATE INDEX ix_coords_and_user_id ON test USING GIST (coords, user_id);

which is likely to result in a big index, but might boost that query - if it works. Be aware that maintaining such an index will significantly slow INSERT and UPDATEs. If you drop the old ix_coords your queries will use ix_coords_and_user_id even if they don't filter on user_id, but it'll be slower than ix_coords. Keeping both will make the INSERT and UPDATE slowdown even worse.

See btree-gist


(Obsoleted by edit to question that changes the question completely; when written the user had a multicolumn index they've now split into two separate ones):

You don't seem to be filtering or sorting on user_id, only create_date. Pg won't (can't?) use only the second term of a multi-column index like (user_id, create_date), it needs use of the first item too.

If you want to index create_date, create a separate index for it. If you use and need the (user_id, create_date) index and don't generally use just user_id alone, see if you can reverse the column order. Alternately create two independent indexes, (user_id) and (create_date). When both columns are needed Pg can combine the two indepependent indexes using a bitmap index scan.



回答2:

I think Craig is correct with his answer, but I just wanted to add a few things (and it wouldn't fit in a comment)

You have to work pretty hard to force PostgreSQL to use an index. The Query optimizer is smart and there are times where it will believe that a sequential table scan will be faster. It is usually right! :) But, you can play with some settings (such as seq_page_cost, random_page_cost, etc) you can play with to try and get it to favor an index. Here is a link to some of the configurations that you might want to examine if you feel like it is not making the correct decision. But, again... my experience is that most of the time, Postgres is smarter than I am! :)

Hope this helps you (or someone in the future).