Best practice to check if a variable exists in plp

2019-09-06 17:08发布

问题:

For instance I've got a stored procedure to import data from csv files and write the read data into a SQL table. I've got a table defined as below:

CREATE TABLE person (id int, name text, age int, married boolean); 

First I check if the record exist already, if exists I do update, if doesn't - insert. Each field of record may have a different type, so the result of a SQL command are assigned to a list of scalar variables:

SELECT name, age, married INTO v_name, v_age, v_married [..]

Let's assume every column is declared as optional (NULL allowed). What's the best way then to check which variable (v_name, v_age, v_married) is not NULL and can be processed?

I've found many solutions:

  • IF NOT FOUND THEN
  • WHEN NO_DATA_FOUND THEN
  • IF v_age IS NOT NULL THEN [...]

or dynamic solution I'm using now using the last way I've mentioned above, when I have to check multiple columns (col):

list_of_columns := ARRAY['name','age','married'];
FOREACH x IN ARRAY list_of_columns LOOP
   EXECUTE 'SELECT ' || x
       || ' FROM person
            WHERE id = ' || quote_literal(v_id)
            INTO y;

   IF  x = 'name' AND (y != v_name OR y IS NULL) THEN
     UPDATE person
     SET    name = v_name
     WHERE  id = v_id;

   ELSIF x = 'age' AND (y != v_age OR y IS NULL) THEN
     UPDATE person
     SET    age = v_age
     WHERE  id = v_id;

   ELSIF x = 'married' AND (y != v_married OR y IS NULL) THEN
     UPDATE person
     SET    married= v_married
     WHERE  id = v_id;
   END IF;
END LOOP;

I'm looking for the best solutions having regard to the best practice and performance. Any help is appreciated!

回答1:

I think, you can radically improve the whole procedure along these lines:

BEGIN;

CREATE TEMP TABLE tmp_p ON COMMIT DROP AS
SELECT * FROM person LIMIT 0;

COPY tmp_p FROM '/absolute/path/to/file' FORMAT csv;

UPDATE person p
SET    name    = t.name
      ,age     = t.age
      ,married = t.person
FROM   tmp_p t
WHERE  p.id = t.id
AND   (p.name    IS DISTINCT FROM t.name OR
       p.age     IS DISTINCT FROM t.age  OR
       p.married IS DISTINCT FROM t.married);

INSERT INTO person p(id, name, age, married, ...)
SELECT id, name, age, married, ...
FROM   tmp_p t
WHERE  NOT EXISTS (SELECT 1 FROM person x WHERE x.id = t.id);

COMMIT; -- drops temp table because of ON COMMIT DROP

Explain

  • COPY your CSV file to a temporary table with matching layout. I copied the layout of the target table with CREATE TABLE AS ... LIMIT 0, you may need to adapt ...

  • UPDATE existing rows. Avoid empty updates (nothing would change) with the last 3 lines in the WHERE clause.
    If you want to skip NULL values in the UPDATE (do you really?), use expressions like COALESCE(t.name, p.name). That falls back to the existing value in the case of NULL. (May be useful, but that wasn't actually in your code.)

  • INSERT non-existing rows. Use a NOT EXISTS semi-join for that.

  • All in one transaction so you don't end up with a half-baked result in the case of a problem along the way. The temp table is dropped at the end of the transaction because I created it that way.