I'm curious which of the following below would be more efficient?
I've always been a bit cautious about using IN
because I believe SQL Server turns the result set into a big IF
statement. For a large result set this could result in poor performance. For small results sets, I'm not sure either is preferable. For large result sets, wouldn't EXISTS
be more efficient?
WHERE EXISTS (SELECT * FROM Base WHERE bx.BoxID = Base.BoxID AND [Rank] = 2)
vs.
WHERE bx.BoxID IN (SELECT BoxID FROM Base WHERE [Rank = 2])
Off the top of my head and not guaranteed to be correct: I believe the second will be faster in this case.
IN
will short-circuit as soon as it finds a match.The accepted answer is shortsighted and the question a bit loose in that:
I believe the optimizer is smart enough to convert between "in" vs "exists" when there is a significant cost difference due to (1) and (2), otherwise it may just be used as a hint (e.g. exists to encourage use of an a seekable index on the right side).
Both forms can be converted to join forms internally, have the join order reversed, and run as loop, hash or merge--based on the estimated row counts (left and right) and index existence in left, right, or both sides.
I'd go with EXISTS over IN, see below link:
SQL Server: JOIN vs IN vs EXISTS - the logical difference