How can I request a random row (or as close to truly random as is possible) in pure SQL?
相关问题
- SQL join to get the cartesian product of 2 columns
- sql execution latency when assign to a variable
- Difference between Types.INTEGER and Types.NULL in
- php PDO::FETCH_ASSOC doesnt detect select after ba
- Bulk update SQL Server C#
For MySQL to get random record
More detail http://jan.kneschke.de/projects/mysql/order-by-rand/
Best way is putting a random value in a new column just for that purpose, and using something like this (pseude code + SQL):
This is the solution employed by the MediaWiki code. Of course, there is some bias against smaller values, but they found that it was sufficient to wrap the random value around to zero when no rows are fetched.
newid() solution may require a full table scan so that each row can be assigned a new guid, which will be much less performant.
rand() solution may not work at all (i.e. with MSSQL) because the function will be evaluated just once, and every row will be assigned the same "random" number.
You may also try using
new id()
function.Just write a your query and use order by
new id()
function. It quite random.Random function from the sql could help. Also if you would like to limit to just one row, just add that in the end.
Most of the solutions here aim to avoid sorting, but they still need to make a sequential scan over a table.
There is also a way to avoid the sequential scan by switching to index scan. If you know the index value of your random row you can get the result almost instantially. The problem is - how to guess an index value.
The following solution works on PostgreSQL 8.4:
I above solution you guess 10 various random index values from range 0 .. [last value of id].
The number 10 is arbitrary - you may use 100 or 1000 as it (amazingly) doesn't have a big impact on the response time.
There is also one problem - if you have sparse ids you might miss. The solution is to have a backup plan :) In this case an pure old order by random() query. When combined id looks like this:
Not the union ALL clause. In this case if the first part returns any data the second one is NEVER executed!
If possible, use stored statements to avoid the inefficiency of both indexes on RND() and creating a record number field.