Insert, on duplicate update in PostgreSQL?

2018-12-31 00:32发布

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:

INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z)
ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2);

I've now switched over to PostgreSQL and apparently this is not correct. It's referring to all the correct tables so I assume it's a matter of different keywords being used but I'm not sure where in the PostgreSQL documentation this is covered.

To clarify, I want to insert several things and if they already exist to update them.

16条回答
十年一品温如言
2楼-- · 2018-12-31 01:01

Personally, I've set up a "rule" attached to the insert statement. Say you had a "dns" table that recorded dns hits per customer on a per-time basis:

CREATE TABLE dns (
    "time" timestamp without time zone NOT NULL,
    customer_id integer NOT NULL,
    hits integer
);

You wanted to be able to re-insert rows with updated values, or create them if they didn't exist already. Keyed on the customer_id and the time. Something like this:

CREATE RULE replace_dns AS 
    ON INSERT TO dns 
    WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time") 
            AND (dns.customer_id = new.customer_id)))) 
    DO INSTEAD UPDATE dns 
        SET hits = new.hits 
        WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id));

Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.

However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.

Also, the COPY command does not use RULES, so if you're inserting with COPY, you'll need to use triggers instead.

查看更多
弹指情弦暗扣
3楼-- · 2018-12-31 01:02

According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.

查看更多
萌妹纸的霸气范
4楼-- · 2018-12-31 01:03

PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)

INSERT INTO the_table (id, column_1, column_2) 
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE 
  SET column_1 = excluded.column_1, 
      column_2 = excluded.column_2;

Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:

Example 38-2. Exceptions with UPDATE/INSERT

This example uses exception handling to perform either UPDATE or INSERT, as appropriate:

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);

CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
    LOOP
        -- first try to update the key
        -- note that "a" must be unique
        UPDATE db SET b = data WHERE a = key;
        IF found THEN
            RETURN;
        END IF;
        -- not there, so try to insert the key
        -- if someone else inserts the same key concurrently,
        -- we could get a unique-key failure
        BEGIN
            INSERT INTO db(a,b) VALUES (key, data);
            RETURN;
        EXCEPTION WHEN unique_violation THEN
            -- do nothing, and loop to try the UPDATE again
        END;
    END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');

There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:

WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;

See a_horse_with_no_name's answer for a clearer example.

查看更多
流年柔荑漫光年
5楼-- · 2018-12-31 01:07

UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.

This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write ten line script in PL/PSQL, you probably should have unit test of one or another kind just for it.

查看更多
君临天下
6楼-- · 2018-12-31 01:08

For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I'd suggest looking into http://mbk.projects.postgresql.org

The current best practice that I'm aware of is:

  1. COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
  2. Acquire Lock [optional] (advisory is preferable to table locks, IMO)
  3. Merge. (the fun part)
查看更多
与风俱净
7楼-- · 2018-12-31 01:10

In PostgreSQL 9.5 and newer you can use INSERT ... ON CONFLICT UPDATE.

See the documentation.

A MySQL INSERT ... ON DUPLICATE KEY UPDATE can be directly rephrased to a ON CONFLICT UPDATE. Neither is SQL-standard syntax, they're both database-specific extensions. There are good reasons MERGE wasn't used for this, a new syntax wasn't created just for fun. (MySQL's syntax also has issues that mean it wasn't adopted directly).

e.g. given setup:

CREATE TABLE tablename (a integer primary key, b integer, c integer);
INSERT INTO tablename (a, b, c) values (1, 2, 3);

the MySQL query:

INSERT INTO tablename (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=c+1;

becomes:

INSERT INTO tablename (a, b, c) values (1, 2, 10)
ON CONFLICT (a) DO UPDATE SET c = tablename.c + 1;

Differences:

  • You must specify the column name (or unique constraint name) to use for the uniqueness check. That's the ON CONFLICT (columnname) DO

  • The keyword SET must be used, as if this was a normal UPDATE statement

It has some nice features too:

  • You can have a WHERE clause on your UPDATE (letting you effectively turn ON CONFLICT UPDATE into ON CONFLICT IGNORE for certain values)

  • The proposed-for-insertion values are available as the row-variable EXCLUDED, which has the same structure as the target table. You can get the original values in the table by using the table name. So in this case EXCLUDED.c will be 10 (because that's what we tried to insert) and "table".c will be 3 because that's the current value in the table. You can use either or both in the SET expressions and WHERE clause.

For background on upsert see How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?

查看更多
登录 后发表回答