PostgreSQL WITH RECURSIVE performance

2020-06-09 01:41发布

问题:

I have a simple question. Somehow I was unable to find a definitive answer.

How much is WITH RECURSIVE syntax optimized in PostgreSQL? By that I mean: is it merely a syntactic sugar for a series of non recursive queries, OR is it more of a single statement that despite its complicated semantics has been optimized as a whole. A follow-up question - just about how much is it possible to optimize this kind of syntax? Of course some concrete data on the matter is most welcome.

回答1:

My experience is that it is indeed very well optimized.

Check out the execution plan for your query generated by EXPLAIN ANALYZE and you'll see how "costly" it really is (and then compare that e.g. to a self written recursive function)



回答2:

I've found it optimized up to a point.

The various subqueries are re-used as expected and are optimized individually, and Postgres optimizes the latter just like any other query.

My main gripe with it has to do with that it won't inject constraints into the CTEs when it could.

For instance:

with recursive
parents as (
select node.id,
       node.parent_id
from nodes as node
union all
select node.id,
       parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents
where id = 2;

Postgres would ideally understand, in the above, that (since node.id is returned as is) it can do:

with recursive
parents as (
select node.id,
       node.parent_id
from nodes as node
where id = 2
union all
select node.id,
       parent.parent_id
from parents as node
join nodes as parent on parent.id = node.parent_id
)
select parent_id
from parents;

... and use an index scan on the primary key. In practice, it'll actually do exactly when the CTE tells it to do: recursively pull all parents for all rows, place the result set in an unnamed temporary table if needed, and then check each row from the result set one for id = 2.

In other words, a CTE does not keep a trace of the "originating" table/row/column set that it's returning. Until this gets optimized properly, creating a view on a recursive query is crazy at best.

A good workaround in the meanwhile is to create an sql function instead:

create function parents(id int) as returns table (id int) $$
    with recursive
    parents as (
    select node.id,
           node.parent_id
    from nodes as node
    where id = $1
    union all
    select node.id,
           parent.parent_id
    from parents as node
    join nodes as parent on parent.id = node.parent_id
    )
    select parent_id
    from parents;
$$ language sql stable strict rows 5 cost 1;

Another issue is you can't use FOR UPDATE with recursive CTEs (for very much the same reason, in fact).