Disclaimer: I have figured out the problem (I think), but I wanted to add this issue to Stack Overflow since I couldn't (easily) find it anywhere. Also, someone might have a better answer than I do.
I have a database where one table "Common" is referenced by several other tables. I wanted to see what records in the Common table were orphaned (i.e., had no references from any of the other tables).
I ran this query:
select *
from Common
where common_id not in (select common_id from Table1)
and common_id not in (select common_id from Table2)
I know that there are orphaned records, but no records were returned. Why not?
(This is SQL Server, if it matters.)
Just off the top of my head...
I ran a few tests and here were my results w.r.t. @patmortech's answer and @rexem's comments.
If either Table1 or Table2 is not indexed on commonID, you get a table scan but @patmortech's query is still twice as fast (for a 100K row master table).
If neither are indexed on commonID, you get two table scans and the difference is negligible.
If both are indexed on commonID, the "not exists" query runs in 1/3 the time.
Table1 or Table2 has some null values for common_id. Use this query instead:
Update:
These articles in my blog describe the differences between the methods in more detail:
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:SQL Server
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:PostgreSQL
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:Oracle
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:MySQL
There are three ways to do such a query:
LEFT JOIN / IS NULL
:NOT EXISTS
:NOT IN
:When
table1.common_id
is not nullable, all these queries are semantically the same.When it is nullable,
NOT IN
is different, sinceIN
(and, therefore,NOT IN
) returnNULL
when a value does not match anything in a list containing aNULL
.This may be confusing but may become more obvious if we recall the alternate syntax for this:
The result of this condition is a boolean product of all comparisons within the list. Of course, a single
NULL
value yields theNULL
result which renders the whole resultNULL
too.We never cannot say definitely that
common_id
is not equal to anything from this list, since at least one of the values isNULL
.Suppose we have these data:
LEFT JOIN / IS NULL
andNOT EXISTS
will return3
,NOT IN
will return nothing (since it will always evaluate to eitherFALSE
orNULL
).In
MySQL
, in case on non-nullable column,LEFT JOIN / IS NULL
andNOT IN
are a little bit (several percent) more efficient thanNOT EXISTS
. If the column is nullable,NOT EXISTS
is the most efficient (again, not much).In
Oracle
, all three queries yield same plans (anANTI JOIN
).In
SQL Server
,NOT IN
/NOT EXISTS
are more efficient, sinceLEFT JOIN / IS NULL
cannot be optimized to anANTI JOIN
by its optimizer.In
PostgreSQL
,LEFT JOIN / IS NULL
andNOT EXISTS
are more efficient thanNOT IN
, sine they are optimized to anAnti Join
, whileNOT IN
useshashed subplan
(or even a plainsubplan
if the subquery is too large to hash)