When entering the following command:
\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;
I get the following error:
ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"
I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?
The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.
It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY
in the first place, so it isn't supported by PostgreSQL in COPY
at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION
block inside a LOOP
over a select from the data copied into a temporary table. This works fairly well but can be slow.
It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:
CREATE TEMPORARY TABLE stagingtable(...);
\copy stagingtable FROM 'somefile.csv'
INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;
Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY
you must LOCK TABLE realtable;
at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists
. If so, skipping errors is absolutely the wrong approach. See:
- How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
- Insert, on duplicate update in PostgreSQL?
- Postgresql - Clean way to insert records if they don't exist, update if they do
- Can COPY be used with a function?
- Postgresql csv importation that skips rows
... this is a much-discussed issue.
One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.
Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.
Additionally, another useful method is to use pgloader