I am getting odd results when using NEWID() in combination with a persistent computed column. Am I using some function wrong?
Not using persisted when creating the column, and therefore calculating values when selecting them, will return correct values.
Updating the column (col1) will also return correct values.
DECLARE @test TABLE (
Col1 INT,
Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED)
INSERT INTO @test (Col1) VALUES
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5)),
(ABS(CHECKSUM(NEWID()) % 5))
SELECT * FROM @test
UPDATE @test SET Col1 = Col1*1
SELECT * FROM @test
/*
Col1 Contains2
2 0
2 0
0 1
4 0
3 0
Col1 Contains2
2 1
2 1
0 0
4 0
3 0
*/
Apparently, the query engine calculates the random number twice for each row.
First time for Col1
, second time for the CASE
statement of the persisted column.
Optimiser doesn't know, or doesn't care in this case that NEWID
is a non-deterministic function and calls it twice.
Actually, it may even not have a choice. Do you want optimiser to create a temporary table behind the scenes, populate its Col1
with results of the expression that generates random numbers, then read that temporary table back and use these saved intermediary results for calculating the result of the CASE
expression, then perform the final INSERT
? In this case, it is cheaper for optimiser to calculate the expression twice without writing intermediary results to disk. In some other cases (say, when you have not 5, but 5 billion rows, or extra indexes), the estimated costs may be different and this behaviour would change.
I don't think you can do much about it. Just be aware of this behaviour. Always explicitly save the generated set of random numbers to a table, then perform further calculations based on them.
I reproduced it in SQL Server 2008 and 2014.
Here is an execution plan that I got in SQL Server 2008, but it is not really interesting. In 2014 the plan is the same, except there is no Top
operator.
Constant Scan
operator outputs a Union1009
list, which is used in Compute Scalar
later. I guess, it comes down to implementation details of Constant Scan
and/or Compute Scalar
operators.
Observed behaviour tells us that newid()
is called twice per row here.
During testing, I removed the functions unrelated to NEWID, and showed the results if the NEWID were calculated ahead of time. It may be helpful to others.
DECLARE @test TABLE (
InsertType VARCHAR(30),
Col1 VARCHAR(5),
Contains2 AS CASE WHEN (Col1) LIKE '%2%' THEN 1 ELSE 0 END) --depends on Col1
INSERT INTO @test (InsertType, Col1) VALUES
('Compute With Insert', LEFT(NEWID(), 5)),
('Compute With Insert', LEFT(NEWID(), 5)),
('Compute With Insert', LEFT(NEWID(), 5)),
('Compute With Insert', LEFT(NEWID(), 5)),
('Compute With Insert', LEFT(NEWID(), 5))
SELECT * FROM @test
DECLARE @A VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @B VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @C VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @D VARCHAR(5) = LEFT(NEWID(), 5);
DECLARE @E VARCHAR(5) = LEFT(NEWID(), 5);
SELECT @A, @B, @C, @D, @E;
INSERT INTO @Test (InsertType, Col1) VALUES
('Compute Before Insert', @A), ('Compute Before Insert', @B), ('Compute Before Insert', @C), ('Compute Before Insert', @D), ('Compute Before Insert', @E)
SELECT * FROM @test
InsertType Col1 Contains2
Compute With Insert C5507 0
Compute With Insert C17D7 0
Compute With Insert D9087 1
Compute With Insert E2DB0 0
Compute With Insert 7D1AF 1
Compute Before Insert 31050 0
Compute Before Insert 2954C 1
Compute Before Insert 9E205 1
Compute Before Insert DDF05 0
Compute Before Insert ED708 0