I'm trying to generate a random time between 8:00 AM and 8:00 PM for each row that is selected from a data set, however, I always get the same random value for each row – I want it to be different for each row.
Table schema & data:
╔══════╦════════════════╗
║ ID ║ CREATED_DATE ║
╠══════╬════════════════╣
║ ID/1 ║ 26/04/2014 ║
║ ID/2 ║ 26/04/2014 ║
║ ID/3 ║ 26/04/2014 ║
║ ID/4 ║ 26/04/2014 ║
║ ID/5 ║ 26/04/2014 ║
╚══════╩════════════════╝
Сurrent SQL statement:
SELECT [ID]
, MyFunction.dbo.AddWorkDays(14, [CREATED_DATE]) AS [New Date]
, CONVERT(VARCHAR, DATEADD(MILLISECOND, CAST(43200000 * RAND() AS INT), CONVERT(TIME, '08:00')), 114) AS [New Time]
FROM [RandomTable]
Current results (same time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗
║ ID ║ New Date ║ New Time ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/2 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/3 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/4 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/5 ║ 10/05/2014 ║ 09:41:43 ║
╚══════╩════════════════╩════════════════╝
Desired results (different time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗
║ ID ║ New Date ║ New Time ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/2 ║ 10/05/2014 ║ 15:05:23 ║
║ ID/3 ║ 10/05/2014 ║ 10:01:05 ║
║ ID/4 ║ 10/05/2014 ║ 19:32:45 ║
║ ID/5 ║ 10/05/2014 ║ 08:43:15 ║
╚══════╩════════════════╩════════════════╝
Any ideas on how to fix this? All of the above is just sample data – my real table has around 2800 records (not sure if that will make a difference to anyone's suggestions).
Test this :
Interpretation of Original Question:
The question states:
Now factor in the following points:
There is some ambiguity in the following areas:
Given the information above, there are a few ways to interpret the request:
RAND(CAST(NEWID() AS VARBINARY)) * 43200
ABS(CHECKSUM(NewId()) % 43201)
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int)
Hence, I based my answer on the idea that:
Answer:
If the situation requires unique times, that cannot be guaranteed with any method of generating truly random values. I really like the use of
CRYPT_GEN_RANDOM
by @Vladimir Baranov, but it is nearly impossible to get a unique set of values generated:Increasing the random value to 8 bytes does seem to work:
Of course, if we are generating down to the second, then there are only 86,400 of those. Reducing the scope seems to help as the following does occasionally work:
However, things get a bit trickier if the uniqueness needs per each day (which seems like a reasonable requirement of this type of project, as opposed to unique across all days). But a random number generator isn't going to know to reset at each new day.
If it is acceptable to merely have the appearance of being random, then we can guarantee uniqueness per each date without:
RAND()
,NEWID()
, orCRYPT_GEN_RANDOM()
The following solution uses the concept of Modular Multiplicative Inverses (MMI) which I learned about in this answer: generate seemingly random unique numeric ID in SQL Server . Of course, that question did not have a tightly-defined range of values like we have here with only 86,400 of them per day. So, I used a range of 86400 (as "Modulo") and tried a few "coprime" values (as "Integer") in an online calculator to get their MMIs:
I use
ROW_NUMBER()
in a CTE, partitioned (i.e. grouped) byCREATED_DATE
as a means of assigning each second of the day a value.But, while the values generated for seconds 0, 1, 2, ... and so on sequentially will appear random, across different days that particular second will map to the same value. So, the second CTE (named "WhichSecond") shifts the starting point for each date by converting the date to an INT (which converts dates to a sequential offset from 1900-01-01) and then multiply by 101.
Returns:
If we want to only generate times between 8:00 AM and 8:00 PM, we only need to make a few minor adjustments:
28800
to the second parameter of theDATEADD
as an 8 hour offsetThe result will be a change to just one line (since the others are diagnostic):
Another means of shifting each day in a less predictable fashion would be to make use of
RAND()
by passing in the INT form ofCREATED_DATE
in the "WhichSecond" CTE. This would give a stable offset per each date sinceRAND(x)
will return the same valuey
for the same value ofx
passed in, but will return a different valuey
for a different value ofx
passed in. Meaning:RAND(1) = y1
RAND(2) = y2
RAND(3) = y3
RAND(2) = y2
The second time
RAND(2)
was called, it still returned the same value ofy2
that it returned the first time it was called.Hence, the "WhichSecond" CTE could be:
You could alternatively use:
The
ABS(CHECKSUM(NewId()) % 43201)
generates a random number between0
and43200
. See Discussion here.SQL Fiddle
MS SQL Server 2008 Schema Setup:
Query 1:
Results:
There are several methods:
NEWID
function to provide a seed forRAND
. It should be used with caution, because there is no guarantee about distribution of NEWID values. One of the best methods to make it more or less uniformly distributed is via theCHECKSUM
:RAND(CHECKSUM(NEWID()))
. The good thing about this method is that NEWID function is available since SQL Server 2000.NEWID
use, say, MD5 of some column as a seed forRAND
:RAND(CHECKSUM(HASHBYTES('MD5', CAST(SomeID AS varbinary(4)))))
Or simply row number:RAND(CHECKSUM(HASHBYTES('MD5', CAST(ROW_NUMBER() OVER(ORDER BY ...) AS varbinary(4)))))
. This method is available since at least SQL Server 2005. The primary difference fromNEWID
method is that you have full control over the random sequence. You can't control whatNEWID
returns and you can't restart the random sequence from the same number again. If you supply same sets of, say, row numbers usingPARTITION BY
you'll get same sets of random numbers. It may be useful in the cases when you need to use the same sequence of random numbers several times. It is possible to get the same random number for two different seeds. I tested it for row numbers from 1 to 1,000,000.MD5
of them are all different.CHECKSUM
ofMD5
result in 122 collisions.RAND
of thisCHECKSUM
result in 246 collisions. When tested with row numbers from 1 to 100,000CHECKSUM
had 1 collision,RAND
had 3 collisions.Random
class, orRNGCryptoServiceProvider
class.CRYPT_GEN_RANDOM
.I will describe the last method in detail, because I think that it is a very good solution for SQL Server 2008 and above.
CRYPT_GEN_RANDOM
is called for each row of the result set, as opposed toRAND
, which is called only once.Besides,
CRYPT_GEN_RANDOM
should provide much better random values, thanRAND
. Better in terms of distribution and crypto-strength. Example:This generates 4 random bytes as
varbinary
. We have to explicitly cast them toint
first. Then result is transformed into a float number between 0 and 1.So, the original query would like this:
Here is a stand-alone example that is easy to copy-paste and try (I used the query from another answer by @Steve Ford):
This is the result:
Addition
When I read the original question I didn't think that it is really necessary to ensure that all generated random numbers are unique. I interpreted the word "different" in the question as a vague opposite to seeing the same number in each row of the result that you see when using a simple
SELECT RAND()
. I think that in many cases it doesn't matter if there are few colliding random numbers. In many cases it would actually be the correct behavior.So, my understanding is that when there is a need of a sequence of unique random numbers, it is in a sense equivalent to the following task. We have a set of some values/rows, for example, a set of unique IDs or all 86400 seconds of a day or 2800 rows for a given day. We want to shuffle these values/rows. We want to rearrange these rows in a random order.
To shuffle the given set of rows we simply need to
ORDER BY
random numbers (these random numbers may have reasonable amount of collisions here). Random numbers could be generated by any method. Something like this:or literally
depending on where and how it is used.
All,
I thought I'd share the answer to my question. I can't remember exactly where I found the details - I think it was via one of the links provided by sgeddes.
I used the following query to get a random time between 8am and 7:55pm (roughly)
The MyRand function is below:
I hope this helps. I haven't read many of the replies above so apologies if someone has a better answer - this is simply how I solved it.
Thanks
The issue OP had while using just
rand()
is due to it's evaluation once per query.From the documentation:
Approach that is described below removes optimization and suppresses this behavior, so
rand()
is evaluated once per row:newid()
generates unique value of typeuniqueidentifier
;cast
to be used as seed inrand([seed])
function to generate a pseudo-randomfloat
value from 0 through 1, and as seed is always unique the returning value is unique too .SQLFiddle