NOT IN clause and NULL values

2018-12-31 02:31发布

This issue came up when I got different records counts for what I thought were identical queries one using a not in where constraint and the other a left join. The table in the not in constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.

To state it simply, why does query A return a result but B doesn't?

A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)

This was on SQL Server 2005. I also found that calling set ansi_nulls off causes B to return a result.

11条回答
谁念西风独自凉
2楼-- · 2018-12-31 03:09

Compare to null is undefined, unless you use IS NULL.

So, when comparing 3 to NULL (query A), it returns undefined.

I.e. SELECT 'true' where 3 in (1,2,null) and SELECT 'true' where 3 not in (1,2,null)

will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE

查看更多
冷夜・残月
3楼-- · 2018-12-31 03:09

It may be concluded from answers here that NOT IN (subquery) doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is NOT IN that handles nulls correctly and returns the correct result, rather than NOT EXISTS.

Consider a table sp to represent suppliers (sno) who are known to supply parts (pno) in quantity (qty). The table currently holds the following values:

      VALUES ('S1', 'P1', NULL), 
             ('S2', 'P1', 200),
             ('S3', 'P1', 1000)

Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.

The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.

The following uses NOT IN to correctly identify supplier 'S2' only:

WITH sp AS 
     ( SELECT * 
         FROM ( VALUES ( 'S1', 'P1', NULL ), 
                       ( 'S2', 'P1', 200 ),
                       ( 'S3', 'P1', 1000 ) )
              AS T ( sno, pno, qty )
     )
SELECT DISTINCT spx.sno
  FROM sp spx
 WHERE spx.pno = 'P1'
       AND 1000 NOT IN (
                        SELECT spy.qty
                          FROM sp spy
                         WHERE spy.sno = spx.sno
                               AND spy.pno = 'P1'
                       );

However, the below query uses the same general structure but with NOT EXISTS but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):

WITH sp AS 
     ( SELECT * 
         FROM ( VALUES ( 'S1', 'P1', NULL ), 
                       ( 'S2', 'P1', 200 ),
                       ( 'S3', 'P1', 1000 ) )
              AS T ( sno, pno, qty )
     )
SELECT DISTINCT spx.sno
  FROM sp spx
 WHERE spx.pno = 'P1'
       AND NOT EXISTS (
                       SELECT *
                         FROM sp spy
                        WHERE spy.sno = spx.sno
                              AND spy.pno = 'P1'
                              AND spy.qty = 1000
                      );

So NOT EXISTS is not the silver bullet it may have appeared!

Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.

This can be achieved (among other possible designs) using two tables:

  • sp suppliers known to supply parts
  • spq suppliers known to supply parts in known quantities

noting there should probably be a foreign key constraint where spq references sp.

The result can then be obtained using the 'minus' relational operator (being the EXCEPT keyword in Standard SQL) e.g.

WITH sp AS 
     ( SELECT * 
         FROM ( VALUES ( 'S1', 'P1' ), 
                       ( 'S2', 'P1' ),
                       ( 'S3', 'P1' ) )
              AS T ( sno, pno )
     ),
     spq AS 
     ( SELECT * 
         FROM ( VALUES ( 'S2', 'P1', 200 ),
                       ( 'S3', 'P1', 1000 ) )
              AS T ( sno, pno, qty )
     )
SELECT sno
  FROM spq
 WHERE pno = 'P1'
EXCEPT 
SELECT sno
  FROM spq
 WHERE pno = 'P1'
       AND qty = 1000;
查看更多
无色无味的生活
4楼-- · 2018-12-31 03:10

NOT IN returns 0 records when compared against an unknown value

Since NULL is an unknown, a NOT IN query containing a NULL or NULLs in the list of possible values will always return 0 records since there is no way to be sure that the NULL value is not the value being tested.

查看更多
有味是清欢
5楼-- · 2018-12-31 03:10

In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)

In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.

查看更多
人气声优
6楼-- · 2018-12-31 03:11

Whenever you use NULL you are really dealing with a Three-Valued logic.

Your first query returns results as the WHERE clause evaluates to:

    3 = 1 or 3 = 2 or 3 = 3 or 3 = null
which is:
    FALSE or FALSE or TRUE or UNKNOWN
which evaluates to 
    TRUE

The second one:

    3 <> 1 and 3 <> 2 and 3 <> null
which evaluates to:
    TRUE and TRUE and UNKNOWN
which evaluates to:
    UNKNOWN

The UNKNOWN is not the same as FALSE you can easily test it by calling:

select 'true' where 3 <> null
select 'true' where not (3 <> null)

Both queries will give you no results

If the UNKNOWN was the same as FALSE then assuming that the first query would give you FALSE the second would have to evaluate to TRUE as it would have been the same as NOT(FALSE).
That is not the case.

There is a very good article on this subject on SqlServerCentral.

The whole issue of NULLs and Three-Valued Logic can be a bit confusing at first but it is essential to understand in order to write correct queries in TSQL

Another article I would recommend is SQL Aggregate Functions and NULL.

查看更多
不再属于我。
7楼-- · 2018-12-31 03:12

this is for Boy:

select party_code 
from abc as a
where party_code not in (select party_code 
                         from xyz 
                         where party_code = a.party_code);

this works regardless of ansi settings

查看更多
登录 后发表回答