Say I have a simple table that has the following fields:
- ID: int, autoincremental (identity), primary key
- Name: varchar(50), unique, has unique index
- Tag: int
I never use the ID field for lookup, because my application is always based on working with the Name field.
I need to change the Tag value from time to time. I'm using the following trivial SQL code:
UPDATE Table SET Tag = XX WHERE Name = YY;
I wondered if anyone knows whether the above is always faster than:
DELETE FROM Table WHERE Name = YY;
INSERT INTO Table (Name, Tag) VALUES (YY, XX);
Again - I know that in the second example the ID is changed, but it does not matter for my application.
In your case, I believe the update will be faster.
Remember indexes!
You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.
An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.
Every write to the database has lots of potential side effects.
Delete: a row must be removed, indexes updated, foreign keys checked and possibly cascade-deleted, etc. Insert: a row must be allocated - this might be in place of a deleted row, might not be; indexes must be updated, foreign keys checked, etc. Update: one or more values must be updated; perhaps the row's data no longer fits into that block of the database so more space must be allocated, which may cascade into multiple blocks being re-written, or lead to fragmented blocks; if the value has foreign key constraints they must be checked, etc.
For a very small number of columns or if the whole row is updated Delete+insert might be faster, but the FK constraint problem is a big one. Sure, maybe you have no FK constraints now, but will that always be true? And if you have a trigger it's easier to write code that handles updates if the update operation is truly an update.
Another issue to think about is that sometimes inserting and deleting hold different locks than updating. The DB might lock the entire table while you are inserting or deleting, as opposed to just locking a single record while you are updating that record.
In the end, I'd suggest just updating a record if you mean to update it. Then check your DB's performance statistics and the statistics for that table to see if there are performance improvements to be made. Anything else is premature.
An example from the ecommerce system I work on: We were storing credit-card transaction data in the database in a two-step approach: first, write a partial transaction to indicate that we've started the process. Then, when the authorization data is returned from the bank update the record. We COULD have deleted then re-inserted the record but instead we just used update. Our DBA told us that the table was fragmented because the DB was only allocating a small amount of space for each row, and the update caused block-chaining since it added a lot of data. However, rather than switch to DELETE+INSERT we just tuned the database to always allocate the whole row, this means the update could use the pre-allocated empty space with no problems. No code change required, and the code remains simple and easy to understand.
A bit too late with this answer, but since I faced a similar question, I made a test with JMeter and a MySQL server on same machine, where I have used:
After running the test for 500 loops, I have obtained the following results:
DEL + INSERT - Average: 62ms
Update - Average: 30ms
Results:
It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.
I'm not saying I'm aware of any product that does this, but it's perfectly legal.
In specific cases, Delete+Insert would save you time. I have a table that has 30000 odd rows and there is a daily update/insert of these records using a data file. The upload process generates 95% of update statements as the records are already there and 5% of inserts for ones that do not exist. Alternatively, uploading the data file records into a temp table, deletion of the destination table for records in the temp table followed by insertion of the same from the temp table has shown 50% gain in time.
What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'
Is either choice better than the other, or are you screwed both ways?