Disclaimer: I have figured out the problem (I think), but I wanted to add this issue to Stack Overflow since I couldn't (easily) find it anywhere. Also, someone might have a better answer than I do.
I have a database where one table "Common" is referenced by several other tables. I wanted to see what records in the Common table were orphaned (i.e., had no references from any of the other tables).
I ran this query:
select *
from Common
where common_id not in (select common_id from Table1)
and common_id not in (select common_id from Table2)
I know that there are orphaned records, but no records were returned. Why not?
(This is SQL Server, if it matters.)
this worked for me :)
Let's suppose these values for common_id:
We want the row in Common to return, because it doesn't exist in any of the other tables. However, the null throws in a monkey wrench.
With those values, the query is equivalent to:
That is equivalent to:
This is where the problem starts. When comparing with a null, the answer is unknown. So the query reduces to
false or unknown is unknown:
true and not unkown is also unkown:
The where condition does not return records where the result is unkown, so we get no records back.
One way to deal with this is to use the exists operator rather than in. Exists never returns unkown because it operates on rows rather than columns. (A row either exists or it doesn't; none of this null ambiguity at the row level!)
If you want the world to be a two-valued boolean place, you must prevent the null (third value) case yourself.
Don't write IN clauses that allow nulls in the list side. Filter them out!