Assume a table structure of MyTable(KEY, datafield1, datafield2...)
.
Often I want to either update an existing record, or insert a new record if it doesn't exist.
Essentially:
IF (key exists)
run update command
ELSE
run insert command
What's the best performing way to write this?
Replace table and field names by whatever you need. Take care of the using ON condition. Then set the appropriate value (and type) for the variables on the DECLARE line.
Cheers.
MS SQL Server 2008 introduces the MERGE statement, which I believe is part of the SQL:2003 standard. As many have shown it is not a big deal to handle one row cases, but when dealing with large datasets, one needs a cursor, with all the performance problems that come along. The MERGE statement will be much welcomed addition when dealing with large datasets.
I usually do what several of the other posters have said with regard to checking for it existing first and then doing whatever the correct path is. One thing you should remember when doing this is that the execution plan cached by sql could be nonoptimal for one path or the other. I believe the best way to do this is to call two different stored procedures.
Now, I don't follow my own advice very often, so take it with a grain of salt.
See my detailed answer to a very similar previous question
@Beau Crawford's is a good way in SQL 2005 and below, though if you're granting rep it should go to the first guy to SO it. The only problem is that for inserts it's still two IO operations.
MS Sql2008 introduces
merge
from the SQL:2003 standard:Now it's really just one IO operation, but awful code :-(
Doing an if exists ... else ... involves doing two requests minimum (one to check, one to take action). The following approach requires only one where the record exists, two if an insert is required:
Does the race conditions really matter if you first try an update followed by an insert? Lets say you have two threads that want to set a value for key key:
Thread 1: value = 1
Thread 2: value = 2
Example race condition scenario
The other thread fails with insert (with error duplicate key) - thread 2.
But; in a multithreaded environment, the OS scheduler decides on the order of the thread execution - in the above scenario, where we have this race condition, it was the OS that decided on the sequence of execution. Ie: It is wrong to say that "thread 1" or "thread 2" was "first" from a system viewpoint.
When the time of execution is so close for thread 1 and thread 2, the outcome of the race condition doesn't matter. The only requirement should be that one of the threads should define the resulting value.
For the implementation: If update followed by insert results in error "duplicate key", this should be treated as success.
Also, one should of course never assume that value in the database is the same as the value you wrote last.