Given:
customer[id BIGINT AUTO_INCREMENT PRIMARY KEY, email VARCHAR(30), count INT]
I'd like to execute the following atomically: Update the customer if he already exists; otherwise, insert a new customer.
In theory this sounds like a perfect fit for SQL-MERGE but the database I am using doesn't support MERGE with AUTO_INCREMENT columns.
https://stackoverflow.com/a/1727788/14731 seems to indicate that if you execute a query or update statement against a non-existent row, the database will lock the index thereby preventing concurrent inserts.
Is this behavior guaranteed by the SQL standard? Are there any databases that do not behave this way?
UPDATE: Sorry, I should have mentioned this earlier: the solution must use READ_COMMITTED transaction isolation unless that is impossible in which case I will accept the use of SERIALIZABLE.
This question is asked about once a week on SO, and the answers are almost invariably wrong.
Here's the right one.
insert customer (email, count)
select 'foo@example.com', 0
where not exists (
select 1 from customer
where email = 'foo@example.com'
)
update customer set count = count + 1
where email = 'foo@example.com'
If you like, you can insert a count of 1 and skip the update
if the inserted rowcount -- however expressed in your DBMS -- returns 1.
The above syntax is absolutely standard and makes no assumption about locking mechanisms or isolation levels. If it doesn't work, your DBMS is broken.
Many people are under the mistaken impression that the select
executes "first" and thus introduces a race condition. No: that select
is part of the insert
. The insert is atomic. There is no race.
Answering my own question since there seems to be a lot of confusion around the topic. It seems that:
-- BAD! DO NOT DO THIS! --
insert customer (email, count)
select 'foo@example.com', 0
where not exists (
select 1 from customer
where email = 'foo@example.com'
)
is open to race-conditions (see Only inserting a row if it's not already there). From what I've been able to gather, the only portable solution to this problem:
- Pick a key to merge against. This could be the primary key, or another unique key, but it must have a unique constraint.
- Try to
insert
a new row. You must catch the error that will occur if the row already exists.
- The hard part is over. At this point, the row is guaranteed to exist and you are protected from race-conditions by the fact that you are holding a write-lock on it (due to the
insert
from the previous step).
- Go ahead and
update
if needed or select
its primary key.
Use Russell Fox's code but use SERIALIZABLE
isolation. This will take a range lock so that the non-existing row is logically locked (together with all other non-existing rows in the surrounding key range).
So it looks like this:
BEGIN TRAN
IF EXISTS (SELECT 1 FROM foo WITH (UPDLOCK, HOLDLOCK) WHERE [email] = 'thisemail')
BEGIN
UPDATE foo...
END
ELSE
BEGIN
INSERT INTO foo...
END
COMMIT
Most code taken from his answer, but fixed to provided mutual exclusion semantics.
IF EXISTS (SELECT 1 FROM foo WHERE [email] = 'thisemail')
BEGIN
UPDATE foo...
END
ELSE
BEGIN
INSERT INTO foo...
END