Query plan caching with pl/pgsql

2019-02-18 01:18发布

问题:

I am having troubles understanding how the query plan caching works for pl/pgsql.

I want to built all-in-one queries with JOINs and IFs, so I will have multiple and different query parameters, and I will be searching in more that one tables.

At first I thought that using pl/pgsql will produce a different plan for each parameters combination, that is not the case, because I have more than one tables

SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command. To get around this restriction, you can construct dynamic commands using the PL/pgSQL EXECUTE statement — at the price of performing new parse analysis and constructing a new execution plan on every execution. from here

Performing a new analysis every time can slow things down I guess. If I dont use an EXECUTE then

If the statement has no parameters, or is executed many times, the SPI manager will consider creating a generic plan that is not dependent on specific parameter values, and caching that for re-use. Typically this will happen only if the execution plan is not very sensitive to the values of the PL/pgSQL variables referenced in it. If it is, generating a plan each time is a net win. from here

Should I use a generic plan then? Is it faster, or is it slower because there is no planning each time ? At least they are cached. My queries are sensitive to their variables, because they are dynamic, but what

If it is, generating a plan each time is a net win.

actually means? That using EXECUTE/plan each time, is better or worst than a generic one? "net win" confuses me.

If a generic plan is inaccurate and EXECUTE/planning each time is slower, then why bother using pl/pgsql? I can write a simple query with a couple of ifs then.

Bottom line is, I cannot conclude if EXECUTE/plan each time is better or worst than generic cached plan in terms of speed and plan caching. Please explain and advice, I am confused.

For reference, this is what I am creating. Works as it is now, but more IFs will be added for mytables and mywhere

DROP FUNCTION IF EXISTS __aa(ii int, fk int);
CREATE  FUNCTION __aa(ii int, fk int) RETURNS TABLE(id INTEGER,val text, fd integer) AS $$
DECLARE 
myt text;
mytables text;
mywhere text;
BEGIN

mytables := 'dyn_tab2';
mywhere := 'dyn_tab2.id=$1';
IF fk IS NOT NULL
THEN
mywhere := mywhere || 'AND dyn_tab2.fk_id=$2';
END IF;

RETURN QUERY EXECUTE format('
SELECT dyn_tab2.id, dyn_tab2.value, dyn_tab2.fk_id 
FROM %I WHERE ' ||mywhere,
mytables) 
USING ii, fk;

END;
$$
LANGUAGE 'plpgsql';

Thanks

回答1:

Plans for static queries (without EXECUTE) are always cached, plans for dynamic queries (with EXECUTE) cannot be cached.

In your case, it would be impossible to use a static query anyway, because, as you quote, that would mean that you can only use a fixed set of tables in your query.

I gather that you are confused by the discussion of the trade-offs between static and dynamic queries in the documentation.

Definition: query parameters are values that are not part of the query string, like $1 or a PL/pgSQL variable name in a static query.

For a static query, the procedure is as follows:

For the first 5 executions, it will be planned using the actual parameter values (“custom plan”), and if the estimated execution time is not significantly shorter than that of a plan that ignores the actual parameter values (“generic plan”), the generic plan will be used from the sixth execution on.

Since the generic plan is cached, that means that there is no planning cost from the sixth execution on.

Dynamic queries are planned every time they are executed.

The trade-off is the following: dynamic queries run up planning cost whenever they are executed, but since they are always planned with the actual parameter values, they end up with a better execution plan, which can save time during query execution.

Now if a query is sensitive to parameter values, that means that the optimal plan will vary significantly with the parameter values, so you will usually win if you plan the query every time.

Queries without parameters will always profit from plan caching, unless the table contents change a lot during the lifetime of a single session, so that the cached plan becomes suboptimal.