What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
What is a fast way to select a random row from a large mysql table?
I'm working in php, but I'm interested in any solution even if it's in another language.
In my case my table has an id as primary key, auto-increment with no gaps, so I can use
COUNT(*)
orMAX(id)
to get the number of rows.I made this script to test the fastest operation:
The results are:
36.8418693542479 ms
0.241041183472 ms
0.216960906982 ms
Answer with the order method:
Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.
Maybe you could do something like:
This is assuming your ID numbers are all sequential with no gaps.
The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.
See the follow excerpt from the MySQL manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.
I ran into the problem where my IDs were not sequential. What I came up with this.
The rows returned are approximately 5, but I limit it to 1.
If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.
What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.
MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)
This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).