可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to add a column to my table with a random number using seed.
If I use RAND:
select *, RAND(5) as random_id from myTable
I get an equal value(0.943597390424144 for example) for all the rows, in the random_id column. I want this value to be different for every row - and that for every time I will pass it 0.5 value(for example), it would be the same values again(as seed should work...).
How can I do this?
(
For example, in PostrgreSql I can write
SELECT setseed(0.5);
SELECT t.* , random() as random_id
FROM myTable t
And I will get different values in each row.
)
Edit:
After I saw the comments here, I have managed to work this out somehow - but it's not efficient at all.
If someone has an idea how to improve it - it will be great. If not - I will have to find another way.
I used the basic idea of the example in here.
Creating a temporary table with blank seed value:
select * into t_myTable from (
select t.*, -1.00000000000000000 as seed
from myTable t
) as temp
Adding a random number for each seed value, one row at a time(this is the bad part...):
USE CPatterns;
GO
DECLARE @seed float;
DECLARE @id int;
DECLARE VIEW_CURSOR CURSOR FOR
select id
from t_myTable t;
OPEN VIEW_CURSOR;
FETCH NEXT FROM VIEW_CURSOR
into @id;
set @seed = RAND(5);
WHILE @@FETCH_STATUS = 0
BEGIN
set @seed = RAND();
update t_myTable set seed = @seed where id = @id
FETCH NEXT FROM VIEW_CURSOR
into @id;
END;
CLOSE VIEW_CURSOR;
DEALLOCATE VIEW_CURSOR;
GO
Creating the view using the seed value and ordering by it
create view my_view AS
select row_number() OVER (ORDER BY seed, id) AS source_id ,t.*
from t_myTable t
回答1:
I think the simplest way to get a repeatable random id in a table is to use row_number()
or a fixed id
on each row. Let me assume that you have a column called id
with a different value on each row.
The idea is just to use this as a seed:
select rand(id*1), as random_id
from mytable;
Note that the seed for the id is an integer and not a floating point number. If you wanted a floating point seed, you could do something with checksum()
:
select rand(checksum(id*0.5)) as random_id
. . .
If you are doing this for sampling (where you will say random_id < 0.1
for a 10% sample for instance, then I often use modulo arithmetic on row_number()
:
with t as (
select t.* row_number() over (order by id) as seqnum
from mytable t
)
select *
from t
where ((seqnum * 17 + 71) % 101) < 0.1
This returns about 10% of the numbers (okay, really 10/101). And you can adjust the sample by fiddling with the constants.
回答2:
Someone sugested a similar query using newid() but I'm giving you the solution that works for me.
There's a workaround that involves newid() instead of rand, but it gives you the same result. You can execute it individually or as a column in a column. It will result in a random value per row rather than the same value for every row in the select statement.
If you need a random number from 0 - N, just change 100 for the desired number.
SELECT TOP 10 [Flag forca]
,1+ABS(CHECKSUM(NEWID())) % 100 AS RANDOM_NEWID
,RAND() AS RANDOM_RAND
FROM PAGSEGURO_WORK.dbo.jobSTM248_tmp_leitores_iso
回答3:
So, in case it would someone someday, here's what I eventually did.
I'm generating the random seeded values in the server side(Java in my case), and then create a table with two columns: the id and the generated random_id.
Now I create the view as an inner join
between the table and the original data.
The generated SQL looks something like that:
CREATE TABLE SEED_DATA(source_id INT PRIMARY KEY, random_id float NOT NULL);
select Rand(5);
insert into SEED_DATA values(1,Rand());
insert into SEED_DATA values(2, Rand());
insert into SEED_DATA values(3, Rand());
.
.
.
insert into SEED_DATA values(1000000, Rand());
and
CREATE VIEW DATA_VIEW
as
SELECT row_number() OVER (ORDER BY random_id, id) AS source_id,column1,column2,...
FROM
( select * from SEED_DATA tmp
inner join my_table i on tmp.source_id = i.id) TEMP
In addition, I create the random numbers in batches, 10,000 or so in each batch(may be higher), so it will not weigh heavily on the server side, and for each batch I insert it to the table in a separate execution.
All of that because I couldn't find a good way to do what I want purely in SQL. Updating row after row is really not efficient.
My own conclusion from this story is that SQL Server is sometimes really annoying...
回答4:
You could convert a random number from the seed:
rand(row_number over (order by ___, ___,___))
Then cast that as a varchar
, Then use the last 3 characters as another seed.
That would give you a nice random value:
rand(right(cast(rand(row_number() over(x,y,x)) as varchar(15)), 3)