The intention of following (simplified) code fragment is to return one random row. Unfortunatly, when we run this fragment in the query analyzer, it returns between zero and three results.
As our input table consists of exactly 5 rows with unique ID's and as we perform a select on this table where ID equals a random number, we are stumped that there would ever be more than one row returned.
Note: among other things, we already tried casting the checksum result to an integer with no avail.
DECLARE @Table TABLE (
ID INTEGER IDENTITY (1, 1)
, FK1 INTEGER
)
INSERT INTO @Table
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
SELECT *
FROM @Table
WHERE ID = ABS(CHECKSUM(NEWID())) % 5 + 1
Edit
Our usage scenario is as follows (please don't comment on wether it is the right thing to do or not. It's the powers that be that have decided)
Ultimately, we must create a result with realistic values where the combination of producer and weights are obfuscated by selecting at random existing weights from the table itself.
The query then would become something like this (also a reason why RAND
can not be used)
SELECT t.ID
, FK1 = (SELECT FK1 FROM @Table WHERE ID=ABS(CHECKSUM(NEWID())) % 5 + 1)
FROM @Table t
Because the inner select could be returning zero results, it would return a NULL value wich again is not acceptable. It is the investigation of why the inner select returns between zero and x results, that this question sproused (is this even English?).
Answer
What turned the light on for me was the simple observation that ABS(CHECKSUM(NEWID())) % 5 + 1)
was re-evaluated for each row. I was under the impression that ABS(CHECKSUM(NEWID())) % 5 + 1)
would get evaluated once, then matched.
Thank you all for answering and slowly but surely leading me to a better understanding.
The reason this happens is because NEWID() gies a different value for each row in the table. For each row, independently of the others, there is a one in five chance of it being returned. Consequently, as it stands, you actually have a 1 in 3125 chance of all 5 rows being returned!
To see this, run the following query. You'll see that each row gets a different ID.
This will fix your code:
However, I'm not sure this is the most efficient method of selecting a single random row from the table.
You might find this MSDN article useful: http://msdn.microsoft.com/en-us/library/Aa175776 (Random Sampling in T-SQL)
EDIT 1: now I think about it, this probably is the most efficient way to do it, assuming the number of rows remains fixed and the IDs are guaranteed to be contiguous.
EDIT 2: to achieve the desired result when used as a sub-query, use TOP 1 like this:
A bit of a guess, and not sure that SQL works this way, but wouldn't SQL evaluate "ABS(CHECKSUM(NEWID())) % 5 + 1" for each row in the table? If so, then each evaluation may or may not return the value of ID of the current row.
Try this instead - generating the random number explicitly first, and matching on that single value:
I don't know how much this will be helpful to you, but try this.. All I understood is you want one random row each time you execute the query..
Here is the logic. Each time you execute the query a newid is being assigned to each row and all are unique and the you just order them with the new uniquely generated rowid. Then all you need to do is select the top most or whatever you want..
Try the following, so you can see what happens:
Or you could use RAND() instead of NEWID(), which is only evaluated once per query in MS SQL
If you want to use CHECKSUM to obtain a random row, this is the way to do it.
what about?
This may help you understand the reasons. Run the query multiple times. How many times does MY_FILTER = ID ?