(postgresql) I was trying to COPY
csv data into a table but I was getting duplicate key violation errors, and there's no way to tell COPY
to ignore those, so following internet wisdom I tried adding this rule:
CREATE OR REPLACE RULE ignore_duplicate_inserts AS
ON INSERT TO mytable
WHERE (EXISTS ( SELECT mytable.id
FROM mytable
WHERE mytable.id = new.id)) DO NOTHING;
to circumvent the problem, but I still get those errors - any ideas why ?
Rules by default add things to the current action:
Roughly speaking, a rule causes additional commands to be executed when a given command on a given table is executed.
But an INSTEAD rule allows you to replace the action:
Alternatively, an INSTEAD rule can replace a given command by another, or cause a command not to be executed at all.
So, I think you want to specify INSTEAD:
CREATE OR REPLACE RULE ignore_duplicate_inserts AS
ON INSERT TO mytable
WHERE (EXISTS ( SELECT mytable.id
FROM mytable
WHERE mytable.id = new.id)) DO INSTEAD NOTHING;
Without the INSTEAD, your rule is essentially saying "do the INSERT and then do nothing" when you want to say "instead of the INSERT, do nothing" and, AFAIK, the DO INSTEAD NOTHING
will do that.
I'm not an expert on PostgreSQL rules but I think adding the "INSTEAD" should work.
UPDATE: Thanks to araqnid we know that:
COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules
So a rule isn't going to work in this situation. However, triggers are fired during COPY FROM so you could write a BEFORE INSERT trigger that would return NULL when it detected duplicate rows:
It can return NULL to skip the operation for the current row. This instructs the executor to not perform the row-level operation that invoked the trigger (the insertion or modification of a particular table row).
That said, I think you'd be better off with araqnid's "load it all into a temporary table, clean it up, and copy it to the final destination" would be a more sensible solution for a bulk loading operation like you have.
COPY FROM
will not invoke rules (http://www.postgresql.org/docs/9.0/interactive/sql-copy.html#AEN58860)
My approach would be to load the CSV data into a temp table, then use an INSERT...SELECT
statement to copy the data into the target table where it doesn't already exist. (If there are duplicates in the CSV data itself, remove those from the temp table first). Something like:
BEGIN;
CREATE TEMP TABLE stage_data(key_column, data_columns...) ON COMMIT DROP;
\copy stage_data from data.csv with csv header
-- prevent any other updates while we are merging input (omit this if you don't need it)
LOCK target_data IN SHARE ROW EXCLUSIVE MODE;
-- insert into target table
INSERT INTO target_data(key_column, data_columns...)
SELECT key_column, data_columns...
FROM stage_data
WHERE NOT EXISTS (SELECT 1 FROM target_data
WHERE target_data.key_column = stage_data.key_column)
END;