可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm trying to generate a random time between 8:00 AM and 8:00 PM for each row that is selected from a data set, however, I always get the same random value for each row – I want it to be different for each row.
Table schema & data:
╔══════╦════════════════╗
║ ID ║ CREATED_DATE ║
╠══════╬════════════════╣
║ ID/1 ║ 26/04/2014 ║
║ ID/2 ║ 26/04/2014 ║
║ ID/3 ║ 26/04/2014 ║
║ ID/4 ║ 26/04/2014 ║
║ ID/5 ║ 26/04/2014 ║
╚══════╩════════════════╝
Сurrent SQL statement:
SELECT [ID]
, MyFunction.dbo.AddWorkDays(14, [CREATED_DATE]) AS [New Date]
, CONVERT(VARCHAR, DATEADD(MILLISECOND, CAST(43200000 * RAND() AS INT), CONVERT(TIME, '08:00')), 114) AS [New Time]
FROM [RandomTable]
Current results (same time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗
║ ID ║ New Date ║ New Time ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/2 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/3 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/4 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/5 ║ 10/05/2014 ║ 09:41:43 ║
╚══════╩════════════════╩════════════════╝
Desired results (different time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗
║ ID ║ New Date ║ New Time ║
╠══════╬════════════════╬════════════════╣
║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║
║ ID/2 ║ 10/05/2014 ║ 15:05:23 ║
║ ID/3 ║ 10/05/2014 ║ 10:01:05 ║
║ ID/4 ║ 10/05/2014 ║ 19:32:45 ║
║ ID/5 ║ 10/05/2014 ║ 08:43:15 ║
╚══════╩════════════════╩════════════════╝
Any ideas on how to fix this? All of the above is just sample data – my real table has around 2800 records (not sure if that will make a difference to anyone's suggestions).
回答1:
Interpretation of Original Question:
The question states:
- Generate a random time between 8:00 AM and 8:00 PM (i.e. a 12-hour window)
- It should be different for each row (i.e. unique across all rows)
- The real table has around 2800 records
Now factor in the following points:
- Sample data shows only a single date
- There are 86,400 seconds in 24 hours, hence 43,200 seconds in 12 hours
There is some ambiguity in the following areas:
- What exactly is random within the context of "different for every row", given that truly random values cannot be guaranteed to be different for every row. In fact, truly random numbers could theoretically be the same for every row. So is the emphasis on "random" or "different"? Or are we really talking about different but not sequentially ordered (to give the appearance of randomness without actually being random)?
- What if there are ever more than 2800 rows? What if there are 1 million rows?
- If there can be more than 43,200 rows, how to handle "different for each row" (since it is not possible to have unique across all rows)?
- Will the date ever vary? If so, are we really talking about "different for each row per date"?
- If "different for each row per date":
- Can the times for each date follow the same, non-sequential pattern? Or does the pattern need to differ per each date?
- Will there ever be more than 43,200 rows for any particular date? If so, the times can only be unique per each set of 43,200 rows.
Given the information above, there are a few ways to interpret the request:
- Emphasis on "random": Dates and number of rows don't matter. Generate truly random times that are highly likely, but not guaranteed, to be unique using one of the three methods shown in the other answers:
- @notulysses:
RAND(CAST(NEWID() AS VARBINARY)) * 43200
- @Steve Ford:
ABS(CHECKSUM(NewId()) % 43201)
- @Vladimir Baranov :
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int)
- Emphasis on "different for each row", always <= 43,200 rows: If the number of rows never exceeds the number of available seconds, it is easy to guarantee unique times across all rows, regardless of same or different dates, and appear to be randomly ordered.
- Emphasis on "different for each row", could be > 43,200 rows: If the number of rows can exceed the number of available seconds, then it is not possible to guarantee uniqueness across all rows, but it would be possible to still guarantee uniqueness across rows of any particular date, provided that no particular date has > 43,200 rows.
Hence, I based my answer on the idea that:
- Even if the number of rows for the O.P. never exceeds 2800, it is more likely that most others who are encountering a similar need for randomness would have a larger data set to work with (i.e. there could easily be 1 million rows, for any number of dates: 1, 5000, etc.)
- Either the sample data is overly simplistic in using the same date for all 5 rows, or even if the date is the same for all rows in this particular case, in most other cases that is less likely to happen
- Uniqueness is to be favored over Randomness
- If there is a pattern to the "seemingly random" ordering of the seconds for each date, there should at least be a varying offset to the start of the sequence across the dates (when the dates are ordered sequentially) to give the appearance of randomness between any small grouping of dates.
Answer:
If the situation requires unique times, that cannot be guaranteed with any method of generating truly random values. I really like the use of CRYPT_GEN_RANDOM
by @Vladimir Baranov, but it is nearly impossible to get a unique set of values generated:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
INSERT INTO @Table (Col1)
SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(4))
FROM [master].sys.objects so
CROSS JOIN [master].sys.objects so2
CROSS JOIN [master].sys.objects so3;
-- 753,571 rows
Increasing the random value to 8 bytes does seem to work:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
INSERT INTO @Table (Col1)
SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(8))
FROM [master].sys.objects so
CROSS JOIN [master].sys.objects so2
CROSS JOIN [master].sys.objects so3;
-- 753,571 rows
Of course, if we are generating down to the second, then there are only 86,400 of those. Reducing the scope seems to help as the following does occasionally work:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE);
INSERT INTO @Table (Col1)
SELECT TOP (86400) CONVERT(BIGINT, CRYPT_GEN_RANDOM(4))
FROM [master].sys.objects so
CROSS JOIN [master].sys.objects so2
CROSS JOIN [master].sys.objects so3;
However, things get a bit trickier if the uniqueness needs per each day (which seems like a reasonable requirement of this type of project, as opposed to unique across all days). But a random number generator isn't going to know to reset at each new day.
If it is acceptable to merely have the appearance of being random, then we can guarantee uniqueness per each date without:
- looping / cursor constructs
- saving already used values in a table
- using
RAND()
, NEWID()
, or CRYPT_GEN_RANDOM()
The following solution uses the concept of Modular Multiplicative Inverses (MMI) which I learned about in this answer: generate seemingly random unique numeric ID in SQL Server . Of course, that question did not have a tightly-defined range of values like we have here with only 86,400 of them per day. So, I used a range of 86400 (as "Modulo") and tried a few "coprime" values (as "Integer") in an online calculator to get their MMIs:
- 13 (MMI = 39877)
- 37 (MMI = 51373)
- 59 (MMI = 39539)
I use ROW_NUMBER()
in a CTE, partitioned (i.e. grouped) by CREATED_DATE
as a means of assigning each second of the day a value.
But, while the values generated for seconds 0, 1, 2, ... and so on sequentially will appear random, across different days that particular second will map to the same value. So, the second CTE (named "WhichSecond") shifts the starting point for each date by converting the date to an INT (which converts dates to a sequential offset from 1900-01-01) and then multiply by 101.
DECLARE @Data TABLE
(
ID INT NOT NULL IDENTITY(1, 1),
CREATED_DATE DATE NOT NULL
);
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05');
INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15');
INSERT INTO @Data (CREATED_DATE) VALUES ('2016-10-22');
INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15');
;WITH cte AS
(
SELECT tmp.ID,
CONVERT(DATETIME, tmp.CREATED_DATE) AS [CREATED_DATE],
ROW_NUMBER() OVER (PARTITION BY tmp.CREATED_DATE ORDER BY (SELECT NULL))
AS [RowNum]
FROM @Data tmp
), WhichSecond AS
(
SELECT cte.ID,
cte.CREATED_DATE,
((CONVERT(INT, cte.[CREATED_DATE]) - 29219) * 101) + cte.[RowNum]
AS [ThisSecond]
FROM cte
)
SELECT parts.*,
(parts.ThisSecond % 86400) AS [NormalizedSecond], -- wrap around to 0 when
-- value goes above 86,400
((parts.ThisSecond % 86400) * 39539) % 86400 AS [ActualSecond],
DATEADD(
SECOND,
(((parts.ThisSecond % 86400) * 39539) % 86400),
parts.CREATED_DATE
) AS [DateWithUniqueTime]
FROM WhichSecond parts
ORDER BY parts.ID;
Returns:
ID CREATED_DATE ThisSecond NormalizedSecond ActualSecond DateWithUniqueTime
1 2014-10-05 1282297 72697 11483 2014-10-05 03:11:23.000
2 2014-10-05 1282298 72698 51022 2014-10-05 14:10:22.000
3 2014-10-05 1282299 72699 4161 2014-10-05 01:09:21.000
4 2014-10-05 1282300 72700 43700 2014-10-05 12:08:20.000
5 2014-10-05 1282301 72701 83239 2014-10-05 23:07:19.000
6 2015-03-15 1298558 2558 52762 2015-03-15 14:39:22.000
7 2016-10-22 1357845 61845 83055 2016-10-22 23:04:15.000
8 2015-03-15 1298559 2559 5901 2015-03-15 01:38:21.000
If we want to only generate times between 8:00 AM and 8:00 PM, we only need to make a few minor adjustments:
- Change the range (as "Modulo") from 86400 to half of it: 43200
- Recalculate the MMI (can use the same "coprime" values as "Integer"): 39539 (same as before)
- Add
28800
to the second parameter of the DATEADD
as an 8 hour offset
The result will be a change to just one line (since the others are diagnostic):
-- second parameter of the DATEADD() call
28800 + (((parts.ThisSecond % 43200) * 39539) % 43200)
Another means of shifting each day in a less predictable fashion would be to make use of RAND()
by passing in the INT form of CREATED_DATE
in the "WhichSecond" CTE. This would give a stable offset per each date since RAND(x)
will return the same value y
for the same value of x
passed in, but will return a different value y
for a different value of x
passed in. Meaning:
RAND(1) = y1
RAND(2) = y2
RAND(3) = y3
RAND(2) = y2
The second time RAND(2)
was called, it still returned the same value of y2
that it returned the first time it was called.
Hence, the "WhichSecond" CTE could be:
(
SELECT cte.ID,
cte.CREATED_DATE,
(RAND(CONVERT(INT, cte.[CREATED_DATE])) * {some number}) + cte.[RowNum]
AS [ThisSecond]
FROM cte
)
回答2:
The issue OP had while using just rand()
is due to it's evaluation once per query.
From the documentation:
If seed is not specified, the SQL Server Database Engine assigns a seed value at random. For a specified seed value, the result returned is always the same.
Approach that is described below removes optimization and suppresses this behavior, so rand()
is evaluated once per row:
dateadd( second
, rand(cast(newid() as varbinary)) * 43200
, cast('08:00:00' as time) )
newid()
generates unique value of type uniqueidentifier
;
- the value is converted with
cast
to be used as seed in rand([seed])
function to generate a pseudo-random float
value from 0 through 1, and as seed is always unique the returning value is unique too .
SQLFiddle
回答3:
You could alternatively use:
SELECT DATEADD(s, ABS(CHECKSUM(NewId()) % 43201), CAST('08:00:00' AS Time))
The ABS(CHECKSUM(NewId()) % 43201)
generates a random number between 0
and 43200
. See Discussion here.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
Query 1:
SELECT DATEADD(s, ABS(CHECKSUM(NewId()) % 43201), CAST('08:00:00' AS Time)) AS [RandomTime]
FROM
( VALUES (1), (2), (3), (4), (5)
) Y(A)
CROSS JOIN
( VALUES (1), (2), (3), (4), (5)
) Z(A)
Results:
| RANDOMTIME |
|------------------|
| 16:51:58.0000000 |
| 10:42:44.0000000 |
| 14:01:38.0000000 |
| 13:33:51.0000000 |
| 18:00:51.0000000 |
| 11:29:03.0000000 |
| 10:21:14.0000000 |
| 16:38:27.0000000 |
| 09:55:37.0000000 |
| 13:21:13.0000000 |
| 11:29:37.0000000 |
| 10:57:49.0000000 |
| 14:56:42.0000000 |
| 15:33:11.0000000 |
| 18:49:45.0000000 |
| 16:23:28.0000000 |
| 09:00:05.0000000 |
| 09:20:01.0000000 |
| 11:26:23.0000000 |
| 15:26:23.0000000 |
| 10:38:44.0000000 |
| 11:46:30.0000000 |
| 16:00:59.0000000 |
| 09:29:18.0000000 |
| 09:09:19.0000000 |
回答4:
There are several methods:
- Generate a table with random numbers in advance and use it whenever needed. Or take this data from some reputable source.
- Various combinations that use
NEWID
function to provide a seed for RAND
. It should be used with caution, because there is no guarantee about distribution of NEWID values. One of the best methods to make it more or less uniformly distributed is via the CHECKSUM
: RAND(CHECKSUM(NEWID()))
. The good thing about this method is that NEWID function is available since SQL Server 2000.
- Instead of
NEWID
use, say, MD5 of some column as a seed for RAND
: RAND(CHECKSUM(HASHBYTES('MD5', CAST(SomeID AS varbinary(4)))))
Or simply row number: RAND(CHECKSUM(HASHBYTES('MD5', CAST(ROW_NUMBER() OVER(ORDER BY ...) AS varbinary(4)))))
. This method is available since at least SQL Server 2005. The primary difference from NEWID
method is that you have full control over the random sequence. You can't control what NEWID
returns and you can't restart the random sequence from the same number again. If you supply same sets of, say, row numbers using PARTITION BY
you'll get same sets of random numbers. It may be useful in the cases when you need to use the same sequence of random numbers several times. It is possible to get the same random number for two different seeds. I tested it for row numbers from 1 to 1,000,000. MD5
of them are all different. CHECKSUM
of MD5
result in 122 collisions. RAND
of this CHECKSUM
result in 246 collisions. When tested with row numbers from 1 to 100,000 CHECKSUM
had 1 collision, RAND
had 3 collisions.
- Another possibility is to simply implement your own user-defined function in T-SQL that generates a random number using your preferred algorithm. In this case you have full control of everything. Usually pseudo random generators have to store their internal state between invocations, so you may end up with having a dedicated table that stores this data.
- You can write your user-defined function using CLR. In this case you can implement your own generator, or use functions built-into .NET, like
Random
class, or RNGCryptoServiceProvider
class.
- At last, since SQL Server 2008 there is a built-in function
CRYPT_GEN_RANDOM
.
I will describe the last method in detail, because I think that it is a very good solution for SQL Server 2008 and above. CRYPT_GEN_RANDOM
is called for each row of the result set, as opposed to RAND
, which is called only once.
CRYPT_GEN_RANDOM (Transact-SQL)
Returns a cryptographic random number generated by the Crypto API
(CAPI). The output is a hexadecimal number of the specified number of
bytes.
Besides, CRYPT_GEN_RANDOM
should provide much better random values, than RAND
. Better in terms of distribution and crypto-strength. Example:
(CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5)
This generates 4 random bytes as varbinary
. We have to explicitly cast them to int
first. Then result is transformed into a float number between 0 and 1.
So, the original query would like this:
SELECT ID AS [ID]
, MyFunction.dbo.AddWorkDays(14, S.CREATED_DATE) AS [New Date]
, CONVERT(VARCHAR, DATEADD(MILLISECOND,
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int),
CONVERT(TIME, '08:00')), 114) AS [New Time]
FROM RandomTable
Here is a stand-alone example that is easy to copy-paste and try (I used the query from another answer by @Steve Ford):
SELECT DATEADD(millisecond,
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int),
CAST('08:00:00' AS Time)) AS [RandomTime]
FROM
( VALUES (1), (2), (3), (4), (5)
) Y(A)
CROSS JOIN
( VALUES (1), (2), (3), (4), (5)
) Z(A)
This is the result:
RandomTime
10:58:24.7200000
19:40:06.7220000
11:04:29.0530000
08:57:31.6130000
15:03:14.9470000
09:15:34.9380000
13:46:43.1250000
11:27:00.8940000
14:42:23.6100000
15:07:56.2120000
11:39:09.8830000
08:16:44.3960000
14:23:38.4820000
17:28:31.7440000
16:29:31.4320000
09:09:15.0210000
12:31:09.8370000
11:23:09.8430000
15:35:45.5480000
17:42:49.3390000
08:07:05.4930000
18:17:16.2980000
11:49:08.2010000
10:20:21.7620000
15:56:58.6110000
Addition
When I read the original question I didn't think that it is really necessary to ensure that all generated random numbers are unique.
I interpreted the word "different" in the question as a vague opposite to seeing the same number in each row of the result that you see when using a simple SELECT RAND()
.
I think that in many cases it doesn't matter if there are few colliding random numbers. In many cases it would actually be the correct behavior.
So, my understanding is that when there is a need of a sequence of unique random numbers, it is in a sense equivalent to the following task.
We have a set of some values/rows, for example, a set of unique IDs or all 86400 seconds of a day or 2800 rows for a given day.
We want to shuffle these values/rows. We want to rearrange these rows in a random order.
To shuffle the given set of rows we simply need to ORDER BY
random numbers (these random numbers may have reasonable amount of collisions here). Random numbers could be generated by any method. Something like this:
ROW_NUMBER() OVER ([optional PARTITION BY ...] ORDER BY CRYPT_GEN_RANDOM(4))
or literally
SELECT ...
FROM ...
ORDER BY CRYPT_GEN_RANDOM(4)
depending on where and how it is used.
回答5:
Test this :
Declare @t table(ID int,CREATED_DATE datetime)
insert into @t values
(1 , '04/26/2014'),
(2 , '04/26/2014'),
(3 , '04/26/2014'),
(4 , '04/26/2014')
;WITH CTE AS
(
SELECT *,CONVERT(VARCHAR, DATEADD(SECOND, RAND(CAST(NEWID() AS VARBINARY)) * 43200,
CAST('08:00:00' AS TIME)),114) AS [New Time] FROM @t WHERE ID=1
UNION ALL
SELECT *,CONVERT(VARCHAR, DATEADD(SECOND, RAND(CAST(NEWID() AS VARBINARY)) * 43200,
CAST('08:00:00' AS TIME)), 114) FROM @t WHERE ID>1 AND ID<=5
)
SELECT * FROM CTE
回答6:
Here's another option that gives you a bit more control over how the time is generated.
You can specify the interval between the random times.
It also doesn't make use of the RAND
function.
DECLARE @StartTime VARCHAR(10) = '08:00',
@EndTime VARCHAR(10) = '20:00',
@Interval INT = 5 --(In Seconds)
WITH times AS(
SELECT CONVERT(TIME, @StartTime) AS t
UNION ALL
SELECT DATEADD(SECOND, @Interval, t)
FROM times
WHERE t < @EndTime
)
SELECT *,
(SELECT TOP 1 t FROM times WHERE d.Id > 0 ORDER BY NEWID())
FROM #data d
option (maxrecursion 0)
On a side note :
If you remove the WHERE
clause in the subquery above (WHERE d.Id > 0
), the same time value is returned for all the rows, i.e. the same problem that you started with
回答7:
All,
I thought I'd share the answer to my question. I can't remember exactly where I found the details - I think it was via one of the links provided by sgeddes.
I used the following query to get a random time between 8am and 7:55pm (roughly)
SELECT convert(varchar,CONVERT(varchar, DATEADD(ms, dbo.MyRand(335 ,830) * 86400, 0), 114),114)
The MyRand function is below:
SET ANSI_NULLS ON;
GO
SET QUOTED_IDENTIFIER ON;
GO
CREATE FUNCTION dbo.myRand(@Min INT, @Max INT) RETURNS decimal(18,15) AS
BEGIN
DECLARE @BinaryFloat BINARY(8)
SELECT @BinaryFloat = CAST(Id AS BINARY) FROM vwGuid
DECLARE
@PartValue TINYINT,
@Mask TINYINT,
@Mantissa FLOAT,
@Exponent SMALLINT,
@Bit TINYINT,
@Ln2 FLOAT,
@BigValue BIGINT,
@RandomNumber FLOAT
SELECT
@Mantissa = 1,
@Bit = 1,
@Ln2 = LOG(2),
@BigValue = CAST(@BinaryFloat AS BIGINT),
@Exponent = (@BigValue & 0x7ff0000000000000) / EXP(52 * @Ln2)
WHILE @Part <= 8
BEGIN
SELECT
@PartValue = CAST(SUBSTRING(@BinaryFloat, @Part, 1) AS TINYINT),
@Mask =
WHILE @Mask > 0
BEGIN
IF @PartValue & @Mask > 0
SET @Mantissa = @Mantissa + EXP(-@Bit * @Ln2)
SELECT
@Mask = @Mask / 2
END
END
SET @RandomNumber = CASE @Exponent WHEN 0 THEN 0 ELSE CAST(@Exponent AS FLOAT) / 2047 END
RETURN CAST((@RandomNumber * (@Max - @Min)) + @Min AS DECIMAL(18,15))
END
GO
END
I hope this helps. I haven't read many of the replies above so apologies if someone has a better answer - this is simply how I solved it.
Thanks