Can UNION ALL be faster than JOINs or do my JOINs

2019-01-25 21:34发布

问题:

I have a Notes table with a uniqueidentifier column that I use as a FK for a variety of other tables in the database (don't worry, the uniqueidentifier columns on the other tables aren't clustered PKs). These other tables represent something of a hierarchy of business objects. As a simple representation, let's say I have two other tables:

  • Leads (PK LeadID)
  • Quotes (PK QuoteID, FK LeadID)

In the display of a Lead in the application, I need to show all notes related to the lead, including those tagged to any Quote that belongs to that lead. I have two options as far as I can see — either a UNION ALL or several LEFT JOIN statements. Here's how they'd look:

SELECT N.*  
FROM Notes N  
JOIN Leads L ON N.TargetUniqueID = L.UniqueID  
WHERE L.LeadID = @LeadID

UNION ALL

SELECT N.*  
FROM Notes N  
JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID  
WHERE Q.LeadID = @LeadID 

Or...

SELECT N.*  
FROM Notes N  
LEFT JOIN Leads L ON N.TargetUniqueID = L.UniqueID  
LEFT JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID  
WHERE L.LeadID = @LeadID OR Q.LeadID = @LeadID

In real life I have a total of five tables that the notes could be attached to, and that number could grow as the application grows. I already have non-clustered indexes set up on the uniqueidentifier columns I'm using, and SQL Profiler says I can't make any more improvements, but when I do a performance test on a realistically-sized test data set, I get the following numbers:

  • UNION ALL — 0.010 sec
  • LEFT JOIN — 0.744 sec

I had always heard that using UNION was bad, and that UNION ALL was only marginally better, but the performance numbers don't seem to bear that out. Granted, the UNION ALL SQL code might be more of a pain to maintain, but at that kind of performance difference it's probably worth it.

So is UNION ALL really better here or am I missing something on the LEFT JOIN code that is slowing things down?

回答1:

The UNION ALL version would probably be satisfied quite easily by 2 index seeks. OR can lead to scans. What do the execution plans look like?

Also have you tried this to avoid accessing Notes twice?

;WITH J AS
(
SELECT UniqueID FROM Leads WHERE LeadID = @LeadID
UNION ALL
SELECT UniqueID FROM Quotes WHERE LeadID = @LeadID
)

SELECT N.*  /*Don't use * though!*/
FROM Notes N  
JOIN J ON N.TargetUniqueID = J.UniqueID  


回答2:

I may be wrong, but I think that you will get a better performance if you rewrite you JOIN version to

SELECT N.*  
FROM Notes N  
LEFT JOIN Leads L ON N.TargetUniqueID = L.UniqueID AND L.LeadID = @LeadID  
LEFT JOIN Quotes Q ON N.TargetUniqueID = Q.UniqueID  AND Q.LeadID = @LeadID
WHERE Q.LeadID IS NOT NULL OR L.LeadID IS NOT NULL


回答3:

In my experience, SQL Server is really bad with join conditions containing OR. I also use UNIONs in that case, and I got similar results like you (maybe half a second instead of 20).

Who said UNIONS are bad? Especially if you use UNION ALL, there should not be a performance hit, as UNION would have to go through the result to only keep unique records (actually doing something like distinct or group by).



回答4:

You second query wouldn' even give correct results as it would covert the left joins to inner joins, see here for explantion as to why your syntax is bad:

http://wiki.lessthandot.com/index.php/WHERE_conditions_on_a_LEFT_JOIN



回答5:

UNION is slower, but UNION ALL should be pretty quick, right?