How can I insert random values into a SQL Server t

2019-01-19 00:22发布

I'm trying to randomly insert values from a list of pre-defined values into a table for testing. I tried using the solution found on this StackOverflow question:

stackoverflow.com/.../update-sql-table-with-random-value-from-other-table

When I I tried this, all of my "random" values that are inserted are exactly the same for all 3000 records.

When I run the part of the query that actually selects the random row, it does select a random record every time I run it by hand, so I know the query works. My best guesses as to what is happening are:

  • SQL Server is optimizing the SELECT somehow, not allowing the subquery to be evaluated more than once
  • The random value's seed is the same on every record the query updates

I'm stuck on what my options are. Am I doing something wrong, or is there another way I should be doing this?

This is the code I'm using:

DECLARE @randomStuff TABLE ([id] INT, [val] VARCHAR(100))

INSERT INTO @randomStuff ([id], [val]) 
VALUES ( 1,  'Test Value 1' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 2,  'Test Value 2' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 3,  'Test Value 3' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 4,  'Test Value 4' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 5,  'Test Value 5' )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 6,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 7,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 8,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 9,  null )
INSERT INTO @randomStuff ([id], [val])
VALUES ( 10, null )

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())

6条回答
看我几分像从前
2楼-- · 2019-01-19 00:26

There is no optimization going on here.

Your using a subquery that selects a single value, there is nothing to optimize.

You can also try putting a column from the table your updating in the select and see if that changes anything. That may trigger an evaluation for every row in MyTable

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID()
    WHERE MyTable.MyColumn = MyTable.MyColumn )
查看更多
在下西门庆
3楼-- · 2019-01-19 00:28

I don't have time to check this right now, but my gut tells me that if you were to create a function on the server to get the random value that it would not optimize it out.

then you would have

UPDATE MyTable
Set MyColumn = dbo.RANDOM_VALUE()
查看更多
地球回转人心会变
4楼-- · 2019-01-19 00:36

When the query engine sees this...

(SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())

... it's all like, "ooooh, a cachable scalar subquery, I'm gonna cache that!"

You need to trick the query engine into thinking it's non-cachable. jfar's answer was close, but the query engine was smart enough to see the tautalogy of MyTable.MyColumn = MyTable.MyColumn, but it ain't smart enough to see through this.

UPDATE MyTable
   SET MyColumn = (SELECT TOP 1 val
                     FROM @randomStuff r
                          INNER JOIN MyTable _MT
                                  ON M.Id = _MT.Id
                    ORDER BY NEWID())
 FROM MyTable M

By bringing in the outer table (MT) into the subquery, the query engine assumes subquery will need to be re-evaluated. Anything will work really, but I went with the (assumed) primary key of MyTable.Id since it'd be indexed and would add very little overhead.

A cursor would probably be just as fast, but is most certainly not as fun.

查看更多
霸刀☆藐视天下
5楼-- · 2019-01-19 00:36

use a cross join to generate random data

查看更多
beautiful°
6楼-- · 2019-01-19 00:39

I came up with a solution which is a bit of a hack and very inefficient (10~ seconds to update 3000 records). Because this is being used to generate test data, I don't have to be concerned about speed however.

In this solution, I iterate over every row in the table and update the values one row at a time. It seems to work:

DECLARE @rows INT 
DECLARE @currentRow INT

SELECT @rows = COUNT(*) FROM dbo.MyTable
SET @currentRow = 1

WHILE @currentRow < @rows
BEGIN 

UPDATE MyTable
SET MyColumn = (SELECT TOP 1 [val] FROM @randomStuff ORDER BY NEWID())
WHERE MyPrimaryKey = (SELECT b.MyPrimaryKey
 FROM(SELECT a.MyPrimaryKey, ROW_NUMBER() OVER (ORDER BY MyPrimaryKey) AS rownumber
      FROM MyTable a) AS b
 WHERE @currentRow = b.rownumber
)

SET @currentRow = @currentRow + 1
END 
查看更多
再贱就再见
7楼-- · 2019-01-19 00:41

I've had a play with this, and found a rather hacky way to do it with the use of an intermediate table variable.

Once @randomStuff is set up, we do this (note in my case, @MyTable is a table variable, adjust accordingly for your normal table):

DECLARE @randomMappings TABLE (id INT, val VARCHAR(100), sorter UNIQUEIDENTIFIER)

INSERT INTO @randomMappings 
SELECT M.id, val, NEWID() AS sort 
FROM @MyTable AS M 
CROSS JOIN @randomstuff

so at this point, we have an intermediate table with every combination of (mytable id, random value), and a random sort value for each row specific to that combination. Then

DELETE others FROM @randomMappings AS others 
INNER JOIN @randomMappings AS lower 
ON (lower.id = others.id) AND (lower.sorter < others.sorter)

This is an old trick which deletes all rows for a given MyTable.id except for the one with the lower sort value -- join the table to itself where the value is smaller, and delete any where such a join succeeded. This just leaves behind the lowest value. So for each MyTable.id, we just have one (random) value left.. Then we just plug it back into the table:

UPDATE @MyTable
SET MyColumn = random.val
FROM @MyTable m, @randomMappings AS random
WHERE (random.id = m.id)

And you're done!

I said it was hacky...

查看更多
登录 后发表回答