SQL Server random using seed

2019-07-30 01:38发布

I want to add a column to my table with a random number using seed. If I use RAND:

select *, RAND(5) as random_id from myTable

I get an equal value(0.943597390424144 for example) for all the rows, in the random_id column. I want this value to be different for every row - and that for every time I will pass it 0.5 value(for example), it would be the same values again(as seed should work...).

How can I do this?

( For example, in PostrgreSql I can write

SELECT setseed(0.5); SELECT t.* , random() as random_id FROM myTable t

And I will get different values in each row. )


Edit:

After I saw the comments here, I have managed to work this out somehow - but it's not efficient at all. If someone has an idea how to improve it - it will be great. If not - I will have to find another way.

I used the basic idea of the example in here.

Creating a temporary table with blank seed value:

select * into t_myTable  from (
select t.*, -1.00000000000000000 as seed
       from myTable t
       ) as temp

Adding a random number for each seed value, one row at a time(this is the bad part...):

USE CPatterns;
GO
DECLARE @seed float;
DECLARE @id int;
DECLARE VIEW_CURSOR CURSOR FOR
select id
from t_myTable t;
OPEN VIEW_CURSOR;
FETCH NEXT FROM VIEW_CURSOR
into @id;
set @seed = RAND(5);

WHILE @@FETCH_STATUS = 0
   BEGIN
      set @seed = RAND();
         update t_myTable set seed = @seed where id = @id

      FETCH NEXT FROM VIEW_CURSOR
         into @id;

   END;
CLOSE VIEW_CURSOR;
DEALLOCATE VIEW_CURSOR;
GO

Creating the view using the seed value and ordering by it

create view my_view AS 
select row_number() OVER (ORDER BY seed, id) AS  source_id ,t.*
       from t_myTable t

4条回答
该账号已被封号
2楼-- · 2019-07-30 02:15

I think the simplest way to get a repeatable random id in a table is to use row_number() or a fixed id on each row. Let me assume that you have a column called id with a different value on each row.

The idea is just to use this as a seed:

select rand(id*1), as random_id
from mytable;

Note that the seed for the id is an integer and not a floating point number. If you wanted a floating point seed, you could do something with checksum():

select rand(checksum(id*0.5)) as random_id
. . .

If you are doing this for sampling (where you will say random_id < 0.1 for a 10% sample for instance, then I often use modulo arithmetic on row_number():

with t as (
      select t.* row_number() over (order by id) as seqnum
      from mytable t
     )
select *
from t
where ((seqnum * 17 + 71) % 101) < 0.1

This returns about 10% of the numbers (okay, really 10/101). And you can adjust the sample by fiddling with the constants.

查看更多
迷人小祖宗
3楼-- · 2019-07-30 02:18

Someone sugested a similar query using newid() but I'm giving you the solution that works for me.

There's a workaround that involves newid() instead of rand, but it gives you the same result. You can execute it individually or as a column in a column. It will result in a random value per row rather than the same value for every row in the select statement. If you need a random number from 0 - N, just change 100 for the desired number.

SELECT TOP 10 [Flag forca]
,1+ABS(CHECKSUM(NEWID())) % 100 AS RANDOM_NEWID
,RAND() AS  RANDOM_RAND
FROM PAGSEGURO_WORK.dbo.jobSTM248_tmp_leitores_iso

Result

查看更多
萌系小妹纸
4楼-- · 2019-07-30 02:31

You could convert a random number from the seed:

rand(row_number over (order by ___, ___,___))

Then cast that as a varchar , Then use the last 3 characters as another seed. That would give you a nice random value:

rand(right(cast(rand(row_number() over(x,y,x)) as varchar(15)), 3)
查看更多
我想做一个坏孩纸
5楼-- · 2019-07-30 02:38

So, in case it would someone someday, here's what I eventually did.

I'm generating the random seeded values in the server side(Java in my case), and then create a table with two columns: the id and the generated random_id. Now I create the view as an inner join between the table and the original data.

The generated SQL looks something like that:

CREATE TABLE SEED_DATA(source_id INT PRIMARY KEY, random_id float NOT NULL);
select Rand(5); 
insert into SEED_DATA values(1,Rand());
insert into SEED_DATA values(2, Rand());
insert into SEED_DATA values(3, Rand());
.
.
.
insert into SEED_DATA values(1000000, Rand());

and

CREATE VIEW DATA_VIEW
as  
    SELECT row_number() OVER (ORDER BY random_id, id) AS source_id,column1,column2,...
    FROM 
        ( select * from SEED_DATA tmp 
          inner join my_table i on tmp.source_id = i.id) TEMP 

In addition, I create the random numbers in batches, 10,000 or so in each batch(may be higher), so it will not weigh heavily on the server side, and for each batch I insert it to the table in a separate execution.

All of that because I couldn't find a good way to do what I want purely in SQL. Updating row after row is really not efficient.

My own conclusion from this story is that SQL Server is sometimes really annoying...

查看更多
登录 后发表回答