Why doesn't this rule prevent duplicate key vi

2019-04-06 19:06发布

问题:

(postgresql) I was trying to COPY csv data into a table but I was getting duplicate key violation errors, and there's no way to tell COPY to ignore those, so following internet wisdom I tried adding this rule:

CREATE OR REPLACE RULE ignore_duplicate_inserts AS
   ON INSERT TO mytable
   WHERE (EXISTS ( SELECT mytable.id
           FROM mytable
          WHERE mytable.id = new.id)) DO NOTHING;

to circumvent the problem, but I still get those errors - any ideas why ?

回答1:

Rules by default add things to the current action:

Roughly speaking, a rule causes additional commands to be executed when a given command on a given table is executed.

But an INSTEAD rule allows you to replace the action:

Alternatively, an INSTEAD rule can replace a given command by another, or cause a command not to be executed at all.

So, I think you want to specify INSTEAD:

CREATE OR REPLACE RULE ignore_duplicate_inserts AS
   ON INSERT TO mytable
   WHERE (EXISTS ( SELECT mytable.id
           FROM mytable
          WHERE mytable.id = new.id)) DO INSTEAD NOTHING;

Without the INSTEAD, your rule is essentially saying "do the INSERT and then do nothing" when you want to say "instead of the INSERT, do nothing" and, AFAIK, the DO INSTEAD NOTHING will do that.

I'm not an expert on PostgreSQL rules but I think adding the "INSTEAD" should work.

UPDATE: Thanks to araqnid we know that:

COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules

So a rule isn't going to work in this situation. However, triggers are fired during COPY FROM so you could write a BEFORE INSERT trigger that would return NULL when it detected duplicate rows:

It can return NULL to skip the operation for the current row. This instructs the executor to not perform the row-level operation that invoked the trigger (the insertion or modification of a particular table row).

That said, I think you'd be better off with araqnid's "load it all into a temporary table, clean it up, and copy it to the final destination" would be a more sensible solution for a bulk loading operation like you have.



回答2:

COPY FROM will not invoke rules (http://www.postgresql.org/docs/9.0/interactive/sql-copy.html#AEN58860)

My approach would be to load the CSV data into a temp table, then use an INSERT...SELECT statement to copy the data into the target table where it doesn't already exist. (If there are duplicates in the CSV data itself, remove those from the temp table first). Something like:

BEGIN;
CREATE TEMP TABLE stage_data(key_column, data_columns...) ON COMMIT DROP;
\copy stage_data from data.csv with csv header
-- prevent any other updates while we are merging input (omit this if you don't need it)
LOCK target_data IN SHARE ROW EXCLUSIVE MODE;
-- insert into target table
INSERT INTO target_data(key_column, data_columns...)
   SELECT key_column, data_columns...
   FROM stage_data
   WHERE NOT EXISTS (SELECT 1 FROM target_data
                     WHERE target_data.key_column = stage_data.key_column)
END;