NOT IN subquery fails when there are NULL-valued r

2019-05-18 16:52发布

Sorry guys, I had no idea how to phrase this one, but I have the following in a where clause:

person_id not in (
    SELECT distinct person_id
    FROM protocol_application_log_devl pal
    WHERE pal.set_id = @set_id
)

When the subquery returns no results, my whole select fails to return anything. To work around this, I replaced person_id in the subquery with isnull(person_id, '00000000-0000-0000-0000-000000000000').

It seems to work, but is there a better way to solve this?

2条回答
Melony?
2楼-- · 2019-05-18 17:22

It is better to use NOT EXISTS anyway:

WHERE NOT EXISTS(
    SELECT 1 FROM protocol_application_log_devl pal
    WHERE pal.person_id = person_id
     AND  pal.set_id = @set_id
)

Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

A pattern I see quite a bit, and wish that I didn't, is NOT IN. When I see this pattern, I cringe. But not for performance reasons – after all, it creates a decent enough plan in this case:

The main problem is that the results can be surprising if the target column is NULLable (SQL Server processes this as a left anti semi join, but can't reliably tell you if a NULL on the right side is equal to – or not equal to – the reference on the left side). Also, optimization can behave differently if the column is NULLable, even if it doesn't actually contain any NULL values

Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival it in terms of performance, when all other variables are the same, but all of the other methods introduce either performance problems or other challenges.

查看更多
一纸荒年 Trace。
3楼-- · 2019-05-18 17:33

While I support Tim's answer as being correct-in-practice (NOT IN is not appropriate here), this is an interesting case noted in the IN / NOT IN documentation:

Caution: Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results1.

This is why the isnull "fixes" the problem - it masks any such NULL values and avoids the unexpected behavior. With that in mind, the following approach would also work (but please heed the advice about not using NOT IN to begin with):

person_id not in (
  SELECT distinct person_id
  FROM protocol_application_log_devl pal
  WHERE pal.set_id = @set_id
    AND person_id NOT NULL    -- guard here
)

However, a NULL person_id is suspicious and might indicate other issues ..

1 Here is the Proof pudding:

select case when 1 not in (2)       then 1 else 0 end as r1,
       case when 1 not in (2, NULL) then 1 else 0 end as r2
-- r1: 1, r2: 0
查看更多
登录 后发表回答