I have a function in Postgres 8.3.5 that selects data from multiple tables and dumps the result in a single table:
create or replace function test_function_2(startdate timestamp, enddate timestamp)
returns void as $$
begin
delete from cl_final_report;
INSERT INTO cl_final_report
SELECT
b.batchkey AS batchnumber,
pv.productkey,
p.name AS productname,
avg(r.value) AS avgchemlean,
sum(r.auxvalue) AS totalweight,
max(o.time) AS timecompleted
FROM result r
LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
LEFT JOIN product p ON pv.productkey = p.productkey
LEFT JOIN object o ON r.objectkey = o.objectkey
LEFT JOIN batch b ON o.batchkey = b.batchkey
WHERE pv.name = 'CL'::text AND
and o.time between startdate and enddate
GROUP BY b.batchkey, pv.productkey, p.name
end
$$ language plpgsql;
This function takes 113 seconds to complete using PgAdmin and executing this command:
select test_function_2('05/02/2013', '05/03/2013')
However, if I replace the input variables in the function with literals like this:
create or replace function test_function_2(startdate timestamp, enddate timestamp)
returns void as $$
begin
delete from cl_final_report;
INSERT INTO cl_final_report
SELECT
b.batchkey AS batchnumber,
pv.productkey,
p.name AS productname,
avg(r.value) AS avgchemlean,
sum(r.auxvalue) AS totalweight,
max(o.time) AS timecompleted
FROM result r
LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
LEFT JOIN product p ON pv.productkey = p.productkey
LEFT JOIN object o ON r.objectkey = o.objectkey
LEFT JOIN batch b ON o.batchkey = b.batchkey
WHERE pv.name = 'CL'::text AND
and o.time between '05/02/2013' and '05/03/2013'
GROUP BY b.batchkey, pv.productkey, p.name
end
$$ language plpgsql;
The function executes in less than 5 seconds.
I'm new to Postgres so there's probably something I'm missing, but I can't seem to find an answer anywhere.
You can force a new plan on each execution if you make the query dynamic:
create or replace function test_function_2(
startdate timestamp, enddate timestamp
) returns void as $function$
begin
delete from cl_final_report;
execute $$
INSERT INTO cl_final_report
SELECT
b.batchkey AS batchnumber,
pv.productkey,
p.name AS productname,
avg(r.value) AS avgchemlean,
sum(r.auxvalue) AS totalweight,
max(o.time) AS timecompleted
FROM result r
LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
LEFT JOIN product p ON pv.productkey = p.productkey
LEFT JOIN object o ON r.objectkey = o.objectkey
LEFT JOIN batch b ON o.batchkey = b.batchkey
WHERE pv.name = 'CL'::text AND
and o.time between $1 and $2
GROUP BY b.batchkey, pv.productkey, p.name
$$ using startdate, enddate;
end;
$function$ language plpgsql;
For it to work in 8.3 without using
do string concatenation:
create or replace function test_function_2(
startdate timestamp, enddate timestamp
) returns void as $function$
begin
delete from cl_final_report;
execute $$
INSERT INTO cl_final_report
SELECT
b.batchkey AS batchnumber,
pv.productkey,
p.name AS productname,
avg(r.value) AS avgchemlean,
sum(r.auxvalue) AS totalweight,
max(o.time) AS timecompleted
FROM result r
LEFT JOIN physicalvalue pv ON r.physicalvaluekey = pv.physicalvaluekey
LEFT JOIN product p ON pv.productkey = p.productkey
LEFT JOIN object o ON r.objectkey = o.objectkey
LEFT JOIN batch b ON o.batchkey = b.batchkey
WHERE pv.name = 'CL'::text AND
and o.time between '$$ || startdate || $$' and '$$ || enddate || $$'
GROUP BY b.batchkey, pv.productkey, p.name
$$;
end;
$function$ language plpgsql;
@A.H's explanation is accurate for PostgreSQL 9.1 or older. So it is
applicable for the OP, who is using the outdated version 8.3.
However, PostgreSQL 9.2 brought a substantial update in this area. PL/pgSQL functions have become a lot smarter about when to replan. I am quoting the release notes for 9.2 here
E.5.3.1.3. Optimizer
Allow the planner to generate custom plans for specific parameter values even when using prepared statements (Tom Lane)
In the past, a prepared statement always had a single "generic" plan that was used for all parameter values, which was frequently much
inferior to the plans used for non-prepared statements containing
explicit constant values. Now, the planner attempts to generate custom
plans for specific parameter values. A generic plan will only be used
after custom plans have repeatedly proven to provide no benefit. This
change should eliminate the performance penalties formerly seen from
use of prepared statements (including non-dynamic statements in
PL/pgSQL).
Bold emphasis mine.
Ergo: One solution for the OP would be to upgrade to PostgreSQL 9.2+ and everything should just work fine automatically.
The query planner / optimizer can calculate a better plan when it has constants at hand.
When no constants are used the planner must generate a plan which is acceptable for all possible values of startdate
and enddate
. If the difference between these two values is very large, then a large part of the table must be fetched. In this case indizes are not used in most cases, because random access costs are higher than linear reads.
But when there are constants, then the planner can calculate based on gathered statistics, that the query will only touch a tiny fraction of the table and hence an index might be faster.
This is a common irk with the PostgreSQL query planner. The manual contains some hints in the PREPARE section (PREPARE
is used internally by pl/pgsql):
In some situations, the query plan produced for a prepared statement will be inferior to the query plan that would have been chosen if the statement had been submitted and executed normally. This is because when the statement is planned and the planner attempts to determine the optimal query plan, the actual values of any parameters specified in the statement are unavailable. PostgreSQL collects statistics on the distribution of data in the table, and can use constant values in a statement to make guesses about the likely result of executing the statement. Since this data is unavailable when planning prepared statements with parameters, the chosen plan might be suboptimal. To examine the query plan PostgreSQL has chosen for a prepared statement, use EXPLAIN.