Hi good people of stackoverflow,
I am working on a trigger for a table on SQL SERVER 2008 R2 for auditing purpose which should add in a timestamp for the UPDATE_TS field before the UPDATE query is sent for execution. The result being that the update occurs with the original values to update on query plus the additional value of the UPDATE_TS as set by the trigger.
I have as well edited this question since I hear that inner-joins are not very heavy in terms of performance on triggers in comparison to not using them. I am not sure if this will add an additional overhead on the trigger as opposed to avoiding inner join in the trigger.
The example I am working on is below. Thank you for any help and suggestions!
Example Table is called MY_TABLE:
CREATE TABLE [myschema].[MY_TABLE](
[MY_TABLE_ID] [bigint] IDENTITY(1,1) NOT NULL,
[FIELD_TO_UPDATE] [varchar](255) NOT NULL,
[CREATE_TS] [datetime] NULL,
[UPDATE_TS] [datetime] NULL),
PRIMARY KEY (MY_TABLE_ID))
TRIGGER to create:
CREATE TRIGGER [myschema].[my_table_update_ts_trigger] ON [mydb].[myschema].[MY_TABLE]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE INTO MY_TABLE ([FIELD_TO_UPDATE],[UPDATE_TS])
SELECT ins.FIELD_TO_UPDATE, GETDATE() FROM INSERTED as ins
END
You need to identify the row(s) you need to update, and you do this with a join or semi-join. It's not going to get much more efficient than this, unless you simply don't perform the update at all:
CREATE TRIGGER [myschema].[my_table_update_ts_trigger]
ON [myschema].[MY_TABLE]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE t SET
FIELD_TO_UPDATE = i.FIELD_TO_UPDATE,
UPDATE_TS = CURRENT_TIMESTAMP
FROM myschema.MY_TABLE AS t
INNER JOIN inserted AS i
ON t.MY_TABLE_ID = i.MY_TABLE_ID;
END
GO
Here is the execution plan:
Since you need to match the rows in inserted
to your base table, and since there may be more than one row that gets updated by any operation (triggers fire per statement in SQL Server, not per row like in some other platforms), and since this isn't a BEFORE update but an INSTEAD OF update (meaning you still have to actually perform the UPDATE that would have happened without the trigger in place), you need to have output from both tables in order to perform the update accurately. This means you need a JOIN, and you cannot use a SEMI-JOIN (e.g. EXISTS), which probably still violates your outlandish requirements anyway. If you only needed to update the timestamp, you could do this:
UPDATE t SET UPDATE_TS = CURRENT_TIMESTAMP
FROM myschema.MY_TABLE AS t
WHERE EXISTS (SELECT 1 FROM inserted WHERE MY_TABLE_ID = t.MY_TABLE_ID);
Unfortunately, that will not work, because FIELD_TO_UPDATE
gets lost without actually pulling in the inserted
pseudo-table in a proper join.
Another way is to use a CROSS APPLY, e.g.:
UPDATE t SET
FIELD_TO_UPDATE = i.FIELD_TO_UPDATE,
UPDATE_TS = CURRENT_TIMESTAMP
FROM inserted AS i
CROSS APPLY myschema.MY_TABLE AS t
WHERE i.MY_TABLE_ID = t.MY_TABLE_ID;
It, too, is missing the nasty JOIN keyword, but it is still performing a JOIN. You can see this because the execution plans are identical:
Now, you can theoretically do this without a join, but that doesn't mean it will perform better. In fact I guarantee you beyond a shadow of a doubt that this will be less efficient, even though it does not contain a single four-letter word like JOIN:
DECLARE @NOW DATETIME = CURRENT_TIMESTAMP,
@MY_TABLE_ID INT,
@FIELD_TO_UPDATE VARCHAR(255);
DECLARE c CURSOR LOCAL FAST_FORWARD FOR
SELECT MY_TABLE_ID, FIELD_TO_UPDATE FROM inserted;
OPEN c;
FETCH NEXT FROM c INTO @FIELD_TO_UPDATE, @MY_TABLE_ID;
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE myschema.MY_TABLE SET
FIELD_TO_UPDATE = @FIELD_TO_UPDATE,
UPDATE_TS = @NOW
WHERE MY_TABLE_ID = @MY_TABLE_ID;
FETCH NEXT FROM c INTO @FIELD_TO_UPDATE, @MY_TABLE_ID;
END
CLOSE c;
DEALLOCATE c;
That said, if you think even for a second that this solution is going to be faster than the one with joins, I have some swampland in Florida to sell you. There are multiple bridges on the property, too. I'm not even going to bother showing the execution plans for this one.
Let's also compare what happens in an INSTEAD OF INSERT trigger. Here is an example, probably similar to what you had:
CREATE TRIGGER myschema.ins_my_table
ON myschema.MY_TABLE
INSTEAD OF INSERT
AS
INSERT myschema.MY_TABLE(FIELD_TO_UPDATE, CREATE_TS)
SELECT FIELD_TO_UPDATE, CURRENT_TIMESTAMP FROM inserted;
GO
This, too, will produce a plan that looks like two queries were executed:
It is important to note that an INSTEAD OF trigger cancels the original update, and you are responsible for issuing your own (even though the plan still shows two queries).
One final option would be to use an AFTER trigger instead of an INSTEAD OF trigger. This will allow you to update the timestamp without the JOIN, because the FIELD_TO_UPDATE has already been updated. But in this case you really will see two queries, and two queries will really be executed (it won't just look that way in the plans).
Some general comments
Since i'm going for performance increase I do not want to use any inner joins in the code used for the trigger.
This doesn't really make much sense; why do you think joins are bad for performance? Sounds like you've watched too many NoSQL videos. Please don't discard technology because you've heard it was bad or because you had a slow join once. Create the query that makes sense, optimize when it doesn't perform well, and come for help when you can't optimize. In almost all cases (there are exceptions, of course), the problem is indexing or statistics and not the fact that you used a JOIN keyword. That doesn't mean you should avoid all joins in all queries at all costs.
If you only want not to see the word JOIN
, it's possbile, just write it like this.
CREATE TRIGGER [myschema].[my_table_update_ts_trigger]
ON [myschema].[MY_TABLE]
INSTEAD OF UPDATE
AS
BEGIN
UPDATE t
SET FIELD_TO_UPDATE = i.FIELD_TO_UPDATE,
UPDATE_TS = CURRENT_TIMESTAMP
FROM myschema.MY_TABLE AS t,
inserted AS i
WHERE t.MY_TABLE_ID = i.MY_TABLE_ID;
END
GO