I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:
stackoverflow.com/.../update-sql-table-with-random-value-from-other-table
When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.
When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:
- SQL Server is optimizing the
SELECT
somehow, not allowing the subquery to be evaluated more than once - The random value's seed is the same on every record the query updates
I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?
This is the code I'm using:
DECLARE @randomStuff TABLE ([id] INT, [val] VARCHAR(100))
INSERT INTO @randomStuff ([id], [val])
VALUES ( 1, 'Test Value 1' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 2, 'Test Value 2' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 3, 'Test Value 3' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 4, 'Test Value 4' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 5, 'Test Value 5' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 6, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 7, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 8, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 9, null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 10, null )
UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
There is no optimization going on here.
Your using a subquery that selects a single value, there is nothing to optimize.
You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable
I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.
then you would have
When the query engine sees this...
... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"
You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of
MyTable.MyColumn = MyTable.MyColumn
, but it ain't smart enough to see through this.By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.
A cursor would probably be just as fast, but is most certainly not as fun.
use a cross join to generate random data
I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.
In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:
I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.
Once @randomStuff is set up, we do this (note in my case, @MyTable is a table variable, adjust accordingly for your normal table):
so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then
This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:
And you're done!
I said it was hacky...