Hi any ideas how to speed up this query?
Input
EXPLAIN SELECT entityid FROM entity e
LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
WHERE
l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'
AND
(entityid NOT IN
(1377776,1377792,1377793,1377794,1377795,1377796... 50000 ids)
)
Output
Nested Loop (cost=0.00..1452373.79 rows=3865 width=8)
-> Nested Loop (cost=0.00..8.58 rows=1 width=8)
Join Filter: (l1.level2_level2id = l2.level2id)
-> Seq Scan on level2entity l2 (cost=0.00..3.17 rows=1 width=8)
Filter: ((userid)::text = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'::text)
-> Seq Scan on level1entity l1 (cost=0.00..4.07 rows=107 width=16)
-> Index Scan using fk_fk18edb1cfb2a41235_idx on entity e (cost=0.00..1452086.09 rows=22329 width=16)
Index Cond: (level1_level1id = l1.level1id)
OK here a simplified version, the joins aren't the bottleneck
SELECT enitityid FROM
(SELECT enitityid FROM enitity e LIMIT 5000) a
WHERE
(enitityid NOT IN
(1377776,1377792,1377793,1377794,1377795, ... 50000 ids)
)
the problem is to find the enties which don't have any of these ids
EXPLAIN
Subquery Scan on a (cost=0.00..312667.76 rows=1 width=8)
Filter: (e.entityid <> ALL ('{1377776,1377792,1377793,1377794, ... 50000 ids}'::bigint[]))
-> Limit (cost=0.00..111.51 rows=5000 width=8)
-> Seq Scan on entity e (cost=0.00..29015.26 rows=1301026 width=8)
You might get a better result if you can rewrite the query to use a hash anti-join.
Something like:
A huge
IN
list is very inefficient. PostgreSQL should ideally identify it and turn it into a relation that it does an anti-join on, but at this point the query planner doesn't know how to do that, and the planning time required to identify this case would cost every query that usesNOT IN
sensibly, so it'd have to be a very low cost check. See this earlier much more detailed answer on the topic.As David Aldridge wrote this is best solved by turning it into an anti-join. I'd write it as a join over a
VALUES
list simply because PostgreSQL is extremely fast at parsingVALUES
lists into relations, but the effect is the same:For a sufficiently large set of values you might even be better off creating a temporary table,
COPY
ing the values into it, creating aPRIMARY KEY
on it, and joining on that.More possibilities explored here:
https://stackoverflow.com/a/17038097/398670
Since you are requiring level2entity record because of your where clause check for a specific userid "l2.userid = " You should make your "LEFT JOIN level2entity" into an "INNER JOIN level2entity"
This will, hopefully, filter down your entity's so your NOT IN will have less work to do.
ok my solution was
as explained in
http://blog.hagander.net/archives/66-Speeding-up-NOT-IN.html