create table test(
id serial primary key,
tagged smallint[]
);
There is gin index on tagged
column, with _int2_ops
operator class:
CREATE INDEX ix ON test USING GIN(col _int2_ops);
When I run this query:
select * from test
where tagged @> ARRAY[11]
order by id limit 100;
EXPLAIN ANALYZE
shows:
Limit (cost=0.43..19524.39 rows=100 width=36) (actual time=25024.124..25027.263 rows=100 loops=1)
-> Index Scan using test_pkey on test (cost=0.43..508404.37 rows=2604 width=36) (actual time=25024.121..25027.251 rows=100 loops=1)
Filter: ((tagged)::integer[] @> '{11}'::integer[])
Rows Removed by Filter: 2399999
Planning time: 6.912 ms
Execution time: 25027.307 ms
Bold emphasis mine. Why is the tagged
column converted to integer[]
type? I think this is the reason why GIN the index isn't used and the query runs slow.
I tried WHERE tagged @> ARRAY[11]::smallint[]
but got this error:
operator is not unique: smallint[] @> smallint[]
If I do the same but use tagged int[]
and create index as
CREATE INDEX ix ON test USING GIN(tagged gin__int_ops);
then the above query uses the GIN index:
"-> Bitmap Index Scan on ix (cost=0.00..1575.53 rows=2604 width=0) (actual time=382.840..382.840 rows=2604480 loops=1)"
" Index Cond: (tagged @> '{11}'::integer[])"
This is a bit faster than previous, but It takes on average 10 second - still too slow. I want to try smallint[]
instead of int[]
, maybe that will be faster ...
Solution
Most probably, the solution is to schema-qualify the operator:
SELECT *
FROM test
WHERE tagged OPERATOR(pg_catalog.@>) '{11}'::int2[]
ORDER BY id
LIMIT 100;
Why?
It's a problem of operator resolution (in combination with type resolution and cast context).
In standard Postgres, there is only a single candidate operator anyarray @> anyarray
, that's the one you want.
Your setup would work just fine if you had not installed the additional module intarray (my assumption), which provides another operator for integer[] @> integer[]
.
Hence, another solution would be to use integer[]
instead and have a GIN index with the gin__int_ops
operator class. Or try the (default for intarray) gist__int_ops
index. Either might be faster, but both don't allow NULL values.
Or you could rename the intarray
operator @>
to disambiguate. (I would not do that. Upgrade and portability issues ensue.)
For expressions involving at least one operand of type integer[]
, Postgres knows which operator to pick: the intarray operator. But then the index is not applicable, because the intarray operator only operates on integer
(int4
) not int2
. And indexes are strictly bound to operators:
- Can PostgreSQL index array columns?
- PostgreSQL behavior in presence of two different type of indexes on the same column
But for int2[] @> int2[]
, Postgres is unable to decide the best operator. Both seem equally applicable. Since the default operator is provided in the pg_catalog
schema and the intarray operator is provided in the public
schema (by default - or wherever you installed the extension), you can help solve the conundrum by schema-qualifying the operator with the OPERATOR()
construct. Related:
- Compare arrays for equality, ignoring order of elements
The error message you get is a bit misleading. But if you look closely, there is a HINT
line added which hints (tada!) in the right direction:
ERROR: operator is not unique: smallint[] @> smallint[]
LINE 1: SELECT NULL::int2[] @> NULL::int2[]
^
HINT: Could not choose a best candidate operator. You might need to add explicit type casts.
You can investigate existing operator candidates for @>
with:
SELECT o.oid, *, oprleft::regtype, oprright::regtype, n.nspname
FROM pg_operator o
JOIN pg_namespace n ON n.oid = o.oprnamespace
WHERE oprname = '@>';
Another alternative solution would be to temporarily(!) set a different search_path, so only the desired operator is found. In the same transaction:
SET LOCAL search_path = pg_catalog;
SELECT ...
But then you have to schema-qualify all tables in the query.
About cast context:
- Generate series of dates - using date type as input
You could change the castcontext
of int2
-> int4
. But I strongly advise against it. Too many possible side effects:
- Is there any way to cast postgresql 9.3 data type so that it can affect only one side