I want to add a column to my table with a random number using seed. If I use RAND:
select *, RAND(5) as random_id from myTable
I get an equal value(0.943597390424144 for example) for all the rows, in the random_id column. I want this value to be different for every row - and that for every time I will pass it 0.5 value(for example), it would be the same values again(as seed should work...).
How can I do this?
( For example, in PostrgreSql I can write
SELECT setseed(0.5);
SELECT t.* , random() as random_id
FROM myTable t
And I will get different values in each row. )
Edit:
After I saw the comments here, I have managed to work this out somehow - but it's not efficient at all. If someone has an idea how to improve it - it will be great. If not - I will have to find another way.
I used the basic idea of the example in here.
Creating a temporary table with blank seed value:
select * into t_myTable from (
select t.*, -1.00000000000000000 as seed
from myTable t
) as temp
Adding a random number for each seed value, one row at a time(this is the bad part...):
USE CPatterns;
GO
DECLARE @seed float;
DECLARE @id int;
DECLARE VIEW_CURSOR CURSOR FOR
select id
from t_myTable t;
OPEN VIEW_CURSOR;
FETCH NEXT FROM VIEW_CURSOR
into @id;
set @seed = RAND(5);
WHILE @@FETCH_STATUS = 0
BEGIN
set @seed = RAND();
update t_myTable set seed = @seed where id = @id
FETCH NEXT FROM VIEW_CURSOR
into @id;
END;
CLOSE VIEW_CURSOR;
DEALLOCATE VIEW_CURSOR;
GO
Creating the view using the seed value and ordering by it
create view my_view AS
select row_number() OVER (ORDER BY seed, id) AS source_id ,t.*
from t_myTable t
I think the simplest way to get a repeatable random id in a table is to use
row_number()
or a fixedid
on each row. Let me assume that you have a column calledid
with a different value on each row.The idea is just to use this as a seed:
Note that the seed for the id is an integer and not a floating point number. If you wanted a floating point seed, you could do something with
checksum()
:If you are doing this for sampling (where you will say
random_id < 0.1
for a 10% sample for instance, then I often use modulo arithmetic onrow_number()
:This returns about 10% of the numbers (okay, really 10/101). And you can adjust the sample by fiddling with the constants.
Someone sugested a similar query using newid() but I'm giving you the solution that works for me.
There's a workaround that involves newid() instead of rand, but it gives you the same result. You can execute it individually or as a column in a column. It will result in a random value per row rather than the same value for every row in the select statement. If you need a random number from 0 - N, just change 100 for the desired number.
You could convert a random number from the seed:
Then cast that as a varchar , Then use the last 3 characters as another seed. That would give you a nice random value:
So, in case it would someone someday, here's what I eventually did.
I'm generating the random seeded values in the server side(Java in my case), and then create a table with two columns: the id and the generated random_id. Now I create the view as an
inner join
between the table and the original data.The generated SQL looks something like that:
and
In addition, I create the random numbers in batches, 10,000 or so in each batch(may be higher), so it will not weigh heavily on the server side, and for each batch I insert it to the table in a separate execution.
All of that because I couldn't find a good way to do what I want purely in SQL. Updating row after row is really not efficient.
My own conclusion from this story is that SQL Server is sometimes really annoying...