Our system runs on SQL Server 2000, and we are in the process of preparing for an upgrade to SQL Server 2008. We have a lot of trigger code where we need to detect a change in a given column and then operate on that column if it has changed.
Obviously SQL Server provides the UPDATE() and COLUMNS_UPDATED() functions, but these functions only tell you which columns have been implicated in the SQL statement, not which columns have actually changed.
To determine which columns have changed, you need code similar to the following (for a column that supports NULLs):
IF UPDATE(Col1)
SELECT @col1_changed = COUNT(*)
FROM Inserted i
INNER JOIN Deleted d ON i.Table_ID = d.Table_ID
WHERE ISNULL(i.Col1, '<unique null value>')
!= ISNULL(i.Col1, '<unique null value>')
This code needs to be repeated for every column you are interested in testing. You can then check the 'changed' value to determine whether or not to perform expensive operations. Of course, this code is itself problematic, as it only tells you that at least one value in the column has changed over all the rows that were modified.
You can test individual UPDATE statements with something like this:
UPDATE Table SET Col1 = CASE WHEN i.Col1 = d.Col1
THEN Col1
ELSE dbo.fnTransform(Col1) END
FROM Inserted i
INNER JOIN Deleted d ON i.Table_ID = d.Table_ID
... but this doesn't work well when you are needing to invoke a stored procedure. In those cases you have to fall back on other approaches as far as I can tell.
My question is whether anyone has insight (or, better yet, hard data) as to what the best/cheapest approach is to the problem of predicating a database operation in a trigger on whether a particular column value in a modified row has actually changed or not. Neither of the methods above seem ideal, and I was wondering if a better method exists.
Although HLGEM gave some good advice above, it wasn't exactly what I needed. I've done quite a bit of testing over the past few days, and I figured I'd at least share the results here given that it looks like no more information will be forthcoming.
I set up a table that was effectively a narrower subset (9 columns) of one of our system's primary tables, and populated it with production data so that it was as deep as our production version of the table.
I then duplicated that table, and on the first one wrote a trigger that attempted to detect every individual column change, and then predicated each column update on whether the data in that column had actually changed or not.
For the second table, I wrote a trigger that used extensive conditional CASE logic to do all the updates to all the columns in a single statement.
I then ran 4 tests:
I repeated this test for both indexed and non-indexed versions of the tables, and then repeated the whole thing on SQL 2000 and SQL 2008 servers.
The results I got were fairly interesting:
The second method (one single update statement with hairy CASE logic in the SET clause) was uniformly better-performing than the individual change detection (to a greater or lesser extent depending on the test) with the single exception of a single-column change affecting many rows where the column was indexed, running on SQL 2000. In our particular case we don't do many narrow, deep updates like this, so for my purposes the single-statement approach is definitely the way to go.
I'd be interested in hearing other people's results of similar types of tests, to see whether my conclusions are as universal as I suspect they are or whether they are specific to our particular configuration.
To get you started, here is the test script I used -- you'll obviously need to come up with other data to populate it with:
I think you may want to investigate using the EXCEPT operator. It is a set based operator that can weed out the rows that have not changed. The nice thing is that considers null values as equal as it looks for rows in the first set listed before the EXCEPT operator and not in the second Listed After the EXCEPT
This handles the issue of columns that allow Nulls without the use of
ISNULL()
in the trigger and only returns the ids of the rows with changes to col1 for a nice set based approach to detecting changes. I haven't tested the approach but it may well be worth your time. I think EXCEPT was introduced with SQL Server 2005.I recommend using the EXCEPT set operator as mentioned by Todd/arghtype above.
I have added this answer because I put the "inserted" before the "deleted" so that INSERTs will be detected as well as UPDATEs. So I can usually have one trigger to cover both inserts and updates. Can also detect deletes by adding OR (NOT EXISTS(SELECT * FROM inserted) AND EXISTS(SELECT * FROM deleted))
It determines if a value has changed in only the columns specified. I have not investigated its performance compared with the other solutions but it is working well in my database.
It uses the EXCEPT set operator to return any rows from the left query that are not also found on the right query. This code can be used in INSERT, UPDATE and DELETE triggers.
The "PKID" column is the primary key. It is required to enable matching between the two sets. If you have multiple columns for the primary key then you will need to include all the columns to do correct matching between the inserted and deleted sets.
If you want to use the changed rows in subsequent trigger logic, I usually put the results of the EXCEPT query into a table variable that can be referenced later on.
I hope this is of interest :-)
There is another technique in SQL Server 2008 for change tracking:
Comparing Change Data Capture and Change Tracking
Let's start with I would never and I mean never invoke a stored proc in a trigger. To account for a multi row insert you would have to cursor through the proc. This means the 200,000 rows you just loaded though a set-based query (say upddating all prices by 10%) might well lock the table for hours as the trigger tries valiantly to handle the load. Plus if something changes in the proc, you could break any inserts to the table at all or even completely hang up the table. I'm a firm beliver that trigger code should call nothing else outside the trigger.
Personally I prefer to simply do my task. If I have written the actions I want to do properly in the trigger it will only update, delete or insert where columns have changed.
Example: suppose you want to update the last_name field that you are storing in two places due to a denormalization placed there for performance reasons.
As you can see it would only update the lnames that are different than what is currently in the table I am updating.
If you want to do auditing and record only those rows which changed then do the comparison using all fields something like where i.field1 <> d.field1 or i.field2 <> d.field3 (etc through all the fields)