I have situation where a "publisher" application essentially keeps a view model up to date by querying a VERY complex view and then merging the results into a denormalized view model table, using separate insert, update, and delete operations.
Now that we have upgraded to SQL 2008 I figured it would be a great time to update these with the SQL MERGE statement. However after writing the query, the subtree cost of the MERGE statement is 1214.54! With the old way, the sum of the Insert/Update/Delete was only 0.104!!
I can't figure out how a more straightforward way of describing the same exact operation could be so much crappier. Perhaps you can see the error of my ways where I cannot.
Some stats on the table: It has 1.9 million rows, and each MERGE operation inserts, updates, or deletes more than 100 of them. In my test case, only 1 is affected.
-- This table variable has the EXACT same structure as the published table
-- Yes, I've tried a temp table instead of a table variable, and it makes no difference
declare @tSource table
(
Key1 uniqueidentifier NOT NULL,
Key2 int NOT NULL,
Data1 datetime NOT NULL,
Data2 datetime,
Data3 varchar(255) NOT NULL,
PRIMARY KEY
(
Key1,
Key2
)
)
-- Fill the temp table with the desired current state of the view model, for
-- only those rows affected by @Key1. I'm not really concerned about the
-- performance of this. The result of this; it's already good. This results
-- in very few rows in the table var, in fact, only 1 in my test case
insert into @tSource
select *
from vw_Source_View with (nolock)
where Key1 = @Key1
-- Now it's time to merge @tSource into TargetTable
;MERGE TargetTable as T
USING tSource S
on S.Key1 = T.Key1 and S.Key2 = T.Key2
-- Only update if the Data columns do not match
WHEN MATCHED AND T.Data1 <> S.Data1 OR T.Data2 <> S.Data2 OR T.Data3 <> S.Data3 THEN
UPDATE SET
T.Data1 = S.Data1,
T.Data2 = S.Data2,
T.Data3 = S.Data3
-- Insert when missing in the target
WHEN NOT MATCHED BY TARGET THEN
INSERT (Key1, Key2, Data1, Data2, Data3)
VALUES (Key1, Key2, Data1, Data2, Data3)
-- Delete when missing in the source, being careful not to delete the REST
-- of the table by applying the T.Key1 = @id condition
WHEN NOT MATCHED BY SOURCE AND T.Key1 = @id THEN
DELETE
;
So how does this get to 1200 subtree cost? The data access from the tables themselves seems to be quite efficient. In fact, 87% of the cost of the MERGE seems to be from a Sort operation near the end of the chain:
MERGE (0%) <- Index Update (12%) <- Sort (87%) <- (...)
And that sort has 0 rows feeding into and out from it. Why does it take 87% of the resources to sort 0 rows?
UPDATE
I posted the actual (not estimated) execution plan for just the MERGE operation in a Gist.