Function taking forever to run for large number of

2019-03-04 07:54发布

问题:

I have created the following function in Postgres 9.3.5:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS 
$BODY
$Declare

result text;

BEGIN

select min(id) into result from table 
where id_used is null and id_type = val2;

update table set 
id_used = 'Y', 
col1 = val1,  
id_used_date = now() 
where id_type = val2 
and id = result;

RETURN result;

END;

$BODY$
LANGUAGE plpgsql VOLATILE COST 100;

When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.

Example of the function when being run:

select get_result('123','idtype');

table columns:

id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)

id is the table index.

Can someone help?

回答1:

Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:

T1            T2            T3            ...
SELECT max(id) -- id 1
              SELECT max(id)  -- id 1
                            SELECT max(id)  -- id 1
                                          ...
              Row id 1 locked, wait ...
                            Row id 1 locked, wait ...
UPDATE id 1
                                          ... 

COMMIT
              Wake up, UPDATE id 1 again!
              COMMIT
                            Wake up, UPDATE id 1 again!
                            COMMIT
                                          ... 

Largely rewritten and simplified as SQL function:

CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
  RETURNS text AS 
$func$
   UPDATE table t
   SET    id_used = 'Y'
        , col1 = val1
        , id_used_date = now() 
   FROM  (
      SELECT id
      FROM   table 
      WHERE  id_used IS NULL
      AND    id_type = val2
      ORDER  BY id
      LIMIT  1
      FOR    UPDATE   -- lock to avoid race condition! see below ...
      ) t1
   WHERE  t.id_type = val2
   -- AND    t.id_used IS NULL -- repeat condition (not if row is locked)
   AND    t.id = t1.id
   RETURNING  id;
$func$  LANGUAGE sql;

Related question with a lot more explanation:

  • Atomic UPDATE .. SELECT in Postgres

Explain

  • Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.

  • You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.

  • You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:

The locking clauses cannot be used in contexts where returned rows cannot be clearly identified with individual table rows; for example they cannot be used with aggregation.

  • Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.

  • If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:

    CREATE INDEX foo_idx ON table (id_type, id)
    WHERE id_used IS NULL;
    

Alternative solutions

Advisory locks May be the superior approach here:

  • Postgres UPDATE ... LIMIT 1

Or you may want to lock many rows at once:

  • How to mark certain nr of rows in table on concurrent access