I have function with two nested cursors. The outer cursor gets payment details of a customer from the source and inserts into the target based on some business logic. The inner cursor takes the payment details of each payment, it happens one after another.
The payments table has about 125000 rows, and about 335000 rows for payment details. All of these rows are to be migrated to a target table. Executing the function takes over two hours and the database CPU usage goes up to 99%.
I am working with PostgreSQL 9.2.
How can I improve the performance of the function?
The code I am using:
CREATE OR REPLACE FUNCTION ccdb_dummy.o_payments1(a integer)
RETURNS void AS
$BODY$
DECLARE
cursor_1 refcursor;
cursor_2 refcursor;
rowcount integer;
rowcount1 integer;
payment_no bigint;
query_1 character varying(250);
lc_cin_num bigint;
test character varying(50);
t_payments ccdb_stg.o_payments_stg%ROWTYPE;
t_payments_details ccdb_stg.o_payment_head_dtls_stg%ROWTYPE;
BEGIN
rowcount := 0;
open cursor_1 for select * from ccdb_stg.o_payments_stg WHERE section_code = a;
select count(1) into rowcount from ccdb_stg.o_payments_stg WHERE section_code = a;
for i IN 1..rowcount loop
fetch cursor_1 into t_payments;
payment_no= nextval('ccdb_stg.payments_seq');
select cin into lc_cin_num from ccdb_dummy.consumers a where a.consumer_num = t_payments.consumer_num;
insert into ccdb_dummy.payments(payment_id,receipt_id,source_system_flag,cin,consumer_nbr,cust_connection_id,cust_type_flg,receipt_type_id,mop_code,mop_details,coll_effect_date,coll_entry_date,receipt_num,receipt_amt,receipt_loc_flg,receipt_date,cancel_flag,acc_type_id,cust_section_code,coll_section_code,remarks,pm_paydate,pm_amount,ref_transaction_id,creation_dt,created_by) values(payment_no,t_payments.receipt_id,t_payments.origin_flag,lc_cin_num,t_payments.consumer_num,t_payments.cust_connection_id,t_payments.cust_type_flag,t_payments.receipt_type_id,t_payments.mop_id,t_payments.mop_details,t_payments.coll_effect_date,t_payments.coll_entry_date,t_payments.receipt_num,t_payments.receipt_amt,t_payments.receipt_flag,t_payments.receipt_date,t_payments.cancel_flag,t_payments.acc_type_flag,t_payments.cust_section_code,t_payments.coll_section_code,t_payments.remarks,t_payments.pm_paydate,t_payments.pm_amount,null,now(),'system');
select count(1) into rowcount1 from ccdb_stg.o_payment_head_dtls_stg a where a.mbc_receipt_id = t_payments.receipt_id;
open cursor_2 for select * from ccdb_stg.o_payment_head_dtls_stg a where a.mbc_receipt_id = t_payments.receipt_id;
for i IN 1..rowcount1 loop
fetch cursor_2 into t_payments_details;
insert into ccdb_dummy.payment_head_dtls(payment_id,mbc_receipt_id,charge_head_code,amount,tariff_id,creation_dt,created_by)
values (payment_no,t_payments_details.mbc_receipt_id,t_payments_details.charge_head_code,t_payments_details.amount,t_payments_details.tariff_id,now(),'system');
end loop;
close cursor_2;
end loop;
close cursor_1;
END;
$BODY$
LANGUAGE plpgsql;
In a first step, I radically simplified your procedural code:
Use the implicit cursor of a
FOR LOOP
instead of unwieldy explicit cursors coupled with redundant counts and loops. Much simpler and faster. Read the chapter "Looping Through Query Results" in the manual.LEFT JOIN
toccdb_dummy.consumers
in the first SELECT instead of running a separate select for every row.Also include
nextval('ccdb_stg.payments_seq') AS payment_no
in the first SELECT. cheaper than lots of separate queries.Minor detail: assignment operator in plpgsql is
:=
, not=
. Details here.But that's far from perfect, still. Consider a completely new approach with set-based operations instead of individual inserts in loops. Much cleaner and faster, yet. That's how modern RDBMS operate best.
One SQL statement with a data-modifying CTE
Wrapped into an SQL function to be a drop-in replacement.
Data-modifying CTEs require Postgres 9.1 or later.
Should do the same as the above plpgsql function exactly (barring errors in translation). Just much simpler and faster.
Find more examples for INSERTs using data-modifying CTEs here on SO.