Random select is not always returning a single row

The intention of following (simplified) code fragment is to return one random row. Unfortunatly, when we run this fragment in the query analyzer, it returns between zero and three results.

As our input table consists of exactly 5 rows with unique ID's and as we perform a select on this table where ID equals a random number, we are stumped that there would ever be more than one row returned.

Note: among other things, we already tried casting the checksum result to an integer with no avail.

DECLARE @Table TABLE (
  ID INTEGER IDENTITY (1, 1)
  , FK1 INTEGER
)

INSERT INTO @Table
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5

SELECT  *
FROM    @Table 
WHERE   ID = ABS(CHECKSUM(NEWID())) % 5 + 1

Edit

Our usage scenario is as follows (please don't comment on wether it is the right thing to do or not. It's the powers that be that have decided)

Ultimately, we must create a result with realistic values where the combination of producer and weights are obfuscated by selecting at random existing weights from the table itself.
The query then would become something like this (also a reason why RAND can not be used)

SELECT  t.ID
        , FK1 = (SELECT FK1 FROM @Table WHERE ID=ABS(CHECKSUM(NEWID())) % 5 + 1)
FROM    @Table t

Because the inner select could be returning zero results, it would return a NULL value wich again is not acceptable. It is the investigation of why the inner select returns between zero and x results, that this question sproused (is this even English?).

Answer

What turned the light on for me was the simple observation that ABS(CHECKSUM(NEWID())) % 5 + 1) was re-evaluated for each row. I was under the impression that ABS(CHECKSUM(NEWID())) % 5 + 1) would get evaluated once, then matched.

Thank you all for answering and slowly but surely leading me to a better understanding.

标签： sql sql-server sql-server-2005 random

6条回答

smile是对你的礼貌

2楼-- · 2019-07-15 11:10

The reason this happens is because NEWID() gies a different value for each row in the table. For each row, independently of the others, there is a one in five chance of it being returned. Consequently, as it stands, you actually have a 1 in 3125 chance of all 5 rows being returned!

To see this, run the following query. You'll see that each row gets a different ID.

SELECT  * , NEWID()
FROM    @Table

This will fix your code:

DECLARE @Id int
SET @Id = ABS(CHECKSUM(NEWID())) % 5 + 1

SELECT  * 
FROM    @Table  
WHERE   ID = @Id

However, I'm not sure this is the most efficient method of selecting a single random row from the table.

You might find this MSDN article useful: http://msdn.microsoft.com/en-us/library/Aa175776 (Random Sampling in T-SQL)

EDIT 1: now I think about it, this probably is the most efficient way to do it, assuming the number of rows remains fixed and the IDs are guaranteed to be contiguous.

EDIT 2: to achieve the desired result when used as a sub-query, use TOP 1 like this:

SELECT  t.ID 
        , FK1 = (SELECT TOP 1 FK1 FROM @Table ORDER BY NEWID()) 
FROM    @Table t

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2019-07-15 11:11

A bit of a guess, and not sure that SQL works this way, but wouldn't SQL evaluate "ABS(CHECKSUM(NEWID())) % 5 + 1" for each row in the table? If so, then each evaluation may or may not return the value of ID of the current row.

Try this instead - generating the random number explicitly first, and matching on that single value:

declare @targetRandom int
set @targetRandom = ABS(CHECKSUM(NEWID())) % 5 + 1

select * from @table where ID = @targetRandom

0人赞添加讨论(0) 举报

贪生不怕死

4楼-- · 2019-07-15 11:13

I don't know how much this will be helpful to you, but try this.. All I understood is you want one random row each time you execute the query..

select top 1 newid() as row,ID from @Table order by row

Here is the logic. Each time you execute the query a newid is being assigned to each row and all are unique and the you just order them with the new uniquely generated rowid. Then all you need to do is select the top most or whatever you want..

0人赞添加讨论(0) 举报

戒情不戒烟

5楼-- · 2019-07-15 11:24

Try the following, so you can see what happens:

SELECT  ABS(CHECKSUM(NEWID())) % 5 + 1 AS Number, @Table.*
FROM    @Table 
WHERE   ID = Number

0人赞添加讨论(0) 举报

Rolldiameter

6楼-- · 2019-07-15 11:27

Or you could use RAND() instead of NEWID(), which is only evaluated once per query in MS SQL

If you want to use CHECKSUM to obtain a random row, this is the way to do it.

SELECT TOP 1 *
FROM @Table
ORDER BY CHECKSUM(NEWID())

what about?

SELECT  t.ID 
        , FK1 = (SELECT TOP 1 FK1 FROM @Table ORDER BY NEWID()) 
FROM    @Table t

0人赞添加讨论(0) 举报

爷的心禁止访问

7楼-- · 2019-07-15 11:27

This may help you understand the reasons. Run the query multiple times. How many times does MY_FILTER = ID ?

SELECT  *, ABS(CHECKSUM(NEWID())) % 5 + 1 AS MY_FILTER
FROM    @Table

SELECT  *, ABS(CHECKSUM(NEWID())) % 5 + 1 AS MY_FILTER
FROM    @Table

SELECT  *, ABS(CHECKSUM(NEWID())) % 5 + 1 AS MY_FILTER
FROM    @Table

0人赞添加讨论(0) 举报