postgresql: \\copy method enter valid entries and

2019-02-26 20:56发布

问题:

When entering the following command:

\copy mmcompany from '<path>/mmcompany.txt' delimiter ',' csv;

I get the following error:

ERROR: duplicate key value violates unique constraint "mmcompany_phonenumber_key"

I understand why it's happening, but how do I execute the command in a way that valid entries will be inserted and ones that create an error will be discarded?

回答1:

The reason PostgreSQL doesn't do this is related to how it implements constraints and validation. When a constraint fails it causes a transaction abort. The transaction is in an unclean state and cannot be resumed.

It is possible to create a new subtransaction for each row but this is very slow and defeats the purpose of using COPY in the first place, so it isn't supported by PostgreSQL in COPY at this time. You can do it yourself in PL/PgSQL with a BEGIN ... EXCEPTION block inside a LOOP over a select from the data copied into a temporary table. This works fairly well but can be slow.

It's better, if possible, to use SQL to check the constraints before doing any insert that violates them. That way you can just:

CREATE TEMPORARY TABLE stagingtable(...);

\copy stagingtable FROM 'somefile.csv'

INSERT INTO realtable
SELECT * FROM stagingtable
WHERE check_constraints_here;

Do keep concurrency issues in mind though. If you're trying to do a merge/upsert via COPY you must LOCK TABLE realtable; at the start of your transaction or you will still have the potential for errors. It looks like that's what you're trying to do - a copy if not exists. If so, skipping errors is absolutely the wrong approach. See:

  • How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
  • Insert, on duplicate update in PostgreSQL?
  • Postgresql - Clean way to insert records if they don't exist, update if they do
  • Can COPY be used with a function?
  • Postgresql csv importation that skips rows

... this is a much-discussed issue.



回答2:

One way to handle the constraint violations is to define triggers on the target table to handle the errors. This is not ideal as there can still be race conditions (if concurrently loading), and triggers have pretty high overhead.

Another method: COPY into a staging table and load the data into the target table using SQL with some handling to skip existing entries.

Additionally, another useful method is to use pgloader