GIN index on smallint[] column not used or error “

2020-02-13 03:39发布

问题:

create table test(
    id serial primary key,
    tagged smallint[]
);

There is gin index on tagged column, with _int2_ops operator class:

CREATE INDEX ix ON test USING GIN(col _int2_ops);

When I run this query:

select * from test
where tagged @> ARRAY[11]
order by id limit 100;

EXPLAIN ANALYZE shows:

Limit  (cost=0.43..19524.39 rows=100 width=36) (actual time=25024.124..25027.263 rows=100 loops=1)
  ->  Index Scan using test_pkey on test  (cost=0.43..508404.37 rows=2604 width=36) (actual time=25024.121..25027.251 rows=100 loops=1)
        Filter: ((tagged)::integer[] @> '{11}'::integer[])
        Rows Removed by Filter: 2399999
Planning time: 6.912 ms
Execution time: 25027.307 ms

Bold emphasis mine. Why is the tagged column converted to integer[] type? I think this is the reason why GIN the index isn't used and the query runs slow.

I tried WHERE tagged @> ARRAY[11]::smallint[] but got this error:

operator is not unique: smallint[] @> smallint[]

If I do the same but use tagged int[] and create index as

CREATE INDEX ix ON test USING GIN(tagged gin__int_ops);

then the above query uses the GIN index:

"->  Bitmap Index Scan on ix  (cost=0.00..1575.53 rows=2604 width=0) (actual time=382.840..382.840 rows=2604480 loops=1)"
"   Index Cond: (tagged @> '{11}'::integer[])"

This is a bit faster than previous, but It takes on average 10 second - still too slow. I want to try smallint[] instead of int[], maybe that will be faster ...

回答1:

Solution

Most probably, the solution is to schema-qualify the operator:

SELECT *
FROM   test
WHERE  tagged OPERATOR(pg_catalog.@>) '{11}'::int2[]
ORDER  BY id
LIMIT  100;

Why?

It's a problem of operator resolution (in combination with type resolution and cast context).

In standard Postgres, there is only a single candidate operator anyarray @> anyarray, that's the one you want.

Your setup would work just fine if you had not installed the additional module intarray (my assumption), which provides another operator for integer[] @> integer[].

Hence, another solution would be to use integer[] instead and have a GIN index with the gin__int_ops operator class. Or try the (default for intarray) gist__int_ops index. Either might be faster, but both don't allow NULL values.
Or you could rename the intarray operator @> to disambiguate. (I would not do that. Upgrade and portability issues ensue.)

For expressions involving at least one operand of type integer[], Postgres knows which operator to pick: the intarray operator. But then the index is not applicable, because the intarray operator only operates on integer (int4) not int2. And indexes are strictly bound to operators:

  • Can PostgreSQL index array columns?
  • PostgreSQL behavior in presence of two different type of indexes on the same column

But for int2[] @> int2[], Postgres is unable to decide the best operator. Both seem equally applicable. Since the default operator is provided in the pg_catalog schema and the intarray operator is provided in the public schema (by default - or wherever you installed the extension), you can help solve the conundrum by schema-qualifying the operator with the OPERATOR() construct. Related:

  • Compare arrays for equality, ignoring order of elements

The error message you get is a bit misleading. But if you look closely, there is a HINT line added which hints (tada!) in the right direction:

ERROR:  operator is not unique: smallint[] @> smallint[]
LINE 1: SELECT NULL::int2[] @> NULL::int2[]
                            ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

You can investigate existing operator candidates for @> with:

SELECT o.oid, *, oprleft::regtype, oprright::regtype, n.nspname
FROM   pg_operator o
JOIN   pg_namespace n ON n.oid = o.oprnamespace
WHERE  oprname = '@>';

Another alternative solution would be to temporarily(!) set a different search_path, so only the desired operator is found. In the same transaction:

SET LOCAL search_path = pg_catalog;
SELECT ...

But then you have to schema-qualify all tables in the query.

About cast context:

  • Generate series of dates - using date type as input

You could change the castcontext of int2 -> int4. But I strongly advise against it. Too many possible side effects:

  • Is there any way to cast postgresql 9.3 data type so that it can affect only one side