Inconsistent results with NEWID() and PERSISTED co

2019-06-21 02:09发布

问题:

I am getting odd results when using NEWID() in combination with a persistent computed column. Am I using some function wrong?

Not using persisted when creating the column, and therefore calculating values when selecting them, will return correct values. Updating the column (col1) will also return correct values.

DECLARE @test TABLE (
    Col1 INT,
    Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED)

INSERT INTO @test (Col1) VALUES
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5))

SELECT * FROM @test
UPDATE @test SET Col1 = Col1*1
SELECT * FROM @test

/*
Col1    Contains2
2   0
2   0
0   1
4   0
3   0

Col1    Contains2
2   1
2   1
0   0
4   0
3   0
*/

回答1:

Apparently, the query engine calculates the random number twice for each row.

First time for Col1, second time for the CASE statement of the persisted column.

Optimiser doesn't know, or doesn't care in this case that NEWID is a non-deterministic function and calls it twice.

Actually, it may even not have a choice. Do you want optimiser to create a temporary table behind the scenes, populate its Col1 with results of the expression that generates random numbers, then read that temporary table back and use these saved intermediary results for calculating the result of the CASE expression, then perform the final INSERT? In this case, it is cheaper for optimiser to calculate the expression twice without writing intermediary results to disk. In some other cases (say, when you have not 5, but 5 billion rows, or extra indexes), the estimated costs may be different and this behaviour would change.

I don't think you can do much about it. Just be aware of this behaviour. Always explicitly save the generated set of random numbers to a table, then perform further calculations based on them.

I reproduced it in SQL Server 2008 and 2014. Here is an execution plan that I got in SQL Server 2008, but it is not really interesting. In 2014 the plan is the same, except there is no Top operator.

Constant Scan operator outputs a Union1009 list, which is used in Compute Scalar later. I guess, it comes down to implementation details of Constant Scan and/or Compute Scalar operators.

Observed behaviour tells us that newid() is called twice per row here.



回答2:

During testing, I removed the functions unrelated to NEWID, and showed the results if the NEWID were calculated ahead of time. It may be helpful to others.

DECLARE @test TABLE (
InsertType VARCHAR(30),
Col1 VARCHAR(5),
Contains2 AS CASE WHEN (Col1) LIKE '%2%' THEN 1 ELSE 0 END) --depends on Col1

INSERT INTO @test (InsertType, Col1) VALUES
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5)),
    ('Compute With Insert', LEFT(NEWID(), 5))

SELECT * FROM @test

DECLARE @A VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @B VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @C VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @D VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @E VARCHAR(5) = LEFT(NEWID(), 5);

SELECT @A, @B, @C, @D, @E;

INSERT INTO @Test (InsertType, Col1) VALUES
('Compute Before Insert', @A), ('Compute Before Insert', @B), ('Compute Before Insert', @C), ('Compute Before Insert', @D), ('Compute Before Insert', @E)

SELECT * FROM @test

InsertType                 Col1        Contains2
Compute With Insert        C5507        0
Compute With Insert        C17D7        0
Compute With Insert        D9087        1
Compute With Insert        E2DB0        0
Compute With Insert        7D1AF        1
Compute Before Insert      31050        0
Compute Before Insert      2954C        1
Compute Before Insert      9E205        1
Compute Before Insert      DDF05        0
Compute Before Insert      ED708        0