I've got a SQL Server table with about 50,000 rows in it. I want to select about 5,000 of those rows at random. I've thought of a complicated way, creating a temp table with a "random number" column, copying my table into that, looping through the temp table and updating each row with RAND()
, and then selecting from that table where the random number column < 0.1. I'm looking for a simpler way to do it, in a single statement if possible.
This article suggest using the NEWID()
function. That looks promising, but I can't see how I could reliably select a certain percentage of rows.
Anybody ever do this before? Any ideas?
Try this:
I was using it in subquery and it returned me same rows in subquery
then i solved with including parent table variable in where
Note the where condtition
The server-side processing language in use (eg PHP, .net, etc) isn't specified, but if it's PHP, grab the required number (or all the records) and instead of randomising in the query use PHP's shuffle function. I don't know if .net has an equivalent function but if it does then use that if you're using .net
ORDER BY RAND() can have quite a performance penalty, depending on how many records are involved.
Just order the table by a random number and obtain the first 5,000 rows using
TOP
.UPDATE
Just tried it and a
newid()
call is sufficent - no need for all the casts and all the math.In MySQL you can do this:
Selecting Rows Randomly from a Large Table on MSDN has a simple, well-articulated solution that addresses the large-scale performance concerns.