This issue came up when I got different records counts for what I thought were identical queries one using a not in
where
constraint and the other a left join
. The table in the not in
constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.
To state it simply, why does query A return a result but B doesn't?
A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)
This was on SQL Server 2005. I also found that calling set ansi_nulls off
causes B to return a result.
also this might be of use to know the logical difference between join, exists and in http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx
Query A is the same as:
Since
3 = 3
is true, you get a result.Query B is the same as:
When
ansi_nulls
is on,3 <> null
is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.When
ansi_nulls
is off,3 <> null
is true, so the predicate evaluates to true, and you get a row.IF you want to filter with NOT IN for a subquery containg NULLs justcheck for not null
The title of this question at the time of writing is
From the text of the question it appears that the problem was occurring in a SQL DML
SELECT
query, rather than a SQL DDLCONSTRAINT
.However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)
Although this is the case for SQL DML, when considering constraints the effect is different.
Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by @Brannon):
As per @Brannon's answer, the first constraint (using
IN
) evaluates to TRUE and the second constraint (usingNOT IN
) evaluates to UNKNOWN. However, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result.The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec
In other words:
In SQL DML, rows are removed from the result when the
WHERE
evaluates to UNKNOWN because it does not satisfy the condition "is true".In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it does satisfy the condition "is not false".
Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)
p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!
Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.
Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned
In the second however you can reduce it to
select 'true' where 3 not in (null)
So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.