Effectively query on column that includes a substr

2020-04-16 06:37发布

问题:

Given a string column with a value similar to /123/12/34/56/5/, what is the optimal way of querying for all the records that include the given number (12 for example)?

The solution from top of my head is:

SELECT id FROM things WHERE things.path LIKE '%/12/%'

But AFAIK this query can't use indexes on the column due to the leading %.

There must be something better. What is it?

Using PostgreSQL, but would prefer the solution that would work across other DBs too.

回答1:

In PostgreSQL 9.1 you could utilize the pg_trgm module and build a GIN index with it.

CREATE EXTENSION pg_trgm; -- once per database

CREATE INDEX things_path_trgm_gin_idx ON things USING gin (path gin_trgm_ops);

Your LIKE expression can use this index even if it is not left-anchored.

See a detailed demo by depesz here.

Normalize it If you can, though.



回答2:

If you're happy turning that column into an array of integers, like:

'/123/12/34/56/5/' becomes ARRAY[123,12,34,56,5]

So that path_arr is a column of type INTEGER[], then you can create a GIN index on that column:

CREATE INDEX ON things USING gin(path_arr);

A query for all items containing 12 then becomes:

SELECT * FROM things WHERE ARRAY[12] <@ path_arr;

Which will use the index. In my test (with a million rows), I get plans like:

EXPLAIN SELECT * FROM things WHERE ARRAY[12]  <@ path_arr;
                                      QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on things  (cost=5915.75..9216.99 rows=1000 width=92)
   Recheck Cond: (path_arr <@ '{12}'::integer[])
   ->  Bitmap Index Scan on things_path_arr_idx  (cost=0.00..5915.50 rows=1000 width=0)
         Index Cond: ('{12}'::integer[] <@ path_arr)
(4 rows)