T-SQL Case statement strange behavior with newid()

2019-02-13 10:47发布

问题:

I'm using SQL Server 2012.

If I do the following to get a list of random-ish numbers in the range [1,3], it works just fine:

SELECT TOP 100 
    ABS(CHECKSUM(NEWID()))%3 + 1 [value_of_rand]
FROM sys.objects

and I get nice things like this (all between 1 and 3).

3
2
2
2
1
....etc.

But if I then put the output of that same chained-random-value function into a CASE statement, it apparently does not produce only the values 1,2,3.

SELECT TOP 100 
    CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
        WHEN 1
            THEN 'one'
        WHEN 2
            THEN 'two'
        WHEN 3
            THEN 'three'
        ELSE
            'that is strange'
    END [value_of_case]
FROM sys.objects

It outputs:

three
that is strange
that is strange
one
two
...etc

What am I doing wrong here?

回答1:

Your:

SELECT TOP 100 
    CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
        WHEN 1
            THEN 'one'
        WHEN 2
            THEN 'two'
        WHEN 3
            THEN 'three'
        ELSE
            'that is strange'
    END [value_of_case]
FROM sys.objects

Actually executed:

SELECT TOP 100 
    CASE 
        WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 1  THEN 'one'
        WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 2  THEN 'two'
        WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 3  THEN 'three'
        ELSE 'that is strange'
    END [value_of_case]
FROM sys.objects;

Basically your expression is non-deterministic and each time is evaluated so you can end up with ELSE clause. So there is no bug or catch, just you use it with variable expression and it is perfectly normal behavior.

This is the same class like COALESCE syntactic-sugar

The COALESCE expression is a syntactic shortcut for the CASE expression. That is, the code COALESCE(expression1,...n) is rewritten by the query optimizer as the following CASE expression:

CASE

WHEN (expression1 IS NOT NULL) THEN expression1

WHEN (expression2 IS NOT NULL) THEN expression2

...

ELSE expressionN

END

This means that the input values (expression1, expression2, expressionN, etc.) will be evaluated multiple times. Also, in compliance with the SQL standard, a value expression that contains a subquery is considered non-deterministic and the subquery is evaluated twice. In either case, different results can be returned between the first evaluation and subsequent evaluations.

EDIT:

Solution: SqlFiddle

SELECT TOP 100 
    CASE t.col
        WHEN 1     THEN 'one'
        WHEN 2     THEN 'two'
        WHEN 3     THEN 'three'
        ELSE      'that is strange'
    END [value_of_case]
FROM sys.objects
CROSS APPLY ( SELECT ABS(CHECKSUM(NEWID()))%3 + 1 ) AS t(col)


回答2:

I think that the problem you are experiencing here is that (ABS(CHECKSUM(NEWID()))%3 + 1) isn't a value, it's an expression and SQL has the option of re-evaluating it whenever it wants to. You can try various syntactical things like removing the extra parenthesis or a CTE. That might make it go away (for now), but it might not, since logically it looks like the same request to the optimizer.

I think that the only sure-fire, supported way to stop this would be to save it first (to a temp table or a real one) and then use a second query to reference the saved values.



回答3:

I cant tell you why, it is indeed strange, but I can give you a workaround. Select the random values into a cte before trying to use them

;with rndsrc(value_of_rand) as
(
SELECT TOP 100 
    ABS(CHECKSUM(NEWID()))%3 + 1
FROM sys.objects
)
SELECT TOP 100 
CASE value_of_rand
    WHEN 1
        THEN 'one'
    WHEN 2
        THEN 'two'
    WHEN 3
        THEN 'three'
    ELSE
        'that is strange'
END [value_of_case]
from rndsrc

No more "that is strange"