quick selection of a random row from a large table

2018-12-31 09:24发布

What is a fast way to select a random row from a large mysql table?

I'm working in php, but I'm interested in any solution even if it's in another language.

24条回答
步步皆殇っ
2楼-- · 2018-12-31 10:01

In my case my table has an id as primary key, auto-increment with no gaps, so I can use COUNT(*) or MAX(id) to get the number of rows.

I made this script to test the fastest operation:

logTime();
query("SELECT COUNT(id) FROM tbl");
logTime();
query("SELECT MAX(id) FROM tbl");
logTime();
query("SELECT id FROM tbl ORDER BY id DESC LIMIT 1");
logTime();

The results are:

  • Count: 36.8418693542479 ms
  • Max: 0.241041183472 ms
  • Order: 0.216960906982 ms

Answer with the order method:

SELECT FLOOR(RAND() * (
    SELECT id FROM tbl ORDER BY id DESC LIMIT 1
)) n FROM tbl LIMIT 1

...
SELECT * FROM tbl WHERE id = $result;
查看更多
初与友歌
3楼-- · 2018-12-31 10:04

Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;
查看更多
还给你的自由
4楼-- · 2018-12-31 10:06

Maybe you could do something like:

SELECT * FROM table 
  WHERE id=
    (FLOOR(RAND() * 
           (SELECT COUNT(*) FROM table)
          )
    );

This is assuming your ID numbers are all sequential with no gaps.

查看更多
旧时光的记忆
5楼-- · 2018-12-31 10:06

The classic "SELECT id FROM table ORDER BY RAND() LIMIT 1" is actually OK.

See the follow excerpt from the MySQL manual:

If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result.

查看更多
唯独是你
6楼-- · 2018-12-31 10:06

I ran into the problem where my IDs were not sequential. What I came up with this.

SELECT * FROM products WHERE RAND()<=(5/(SELECT COUNT(*) FROM products)) LIMIT 1

The rows returned are approximately 5, but I limit it to 1.

If you want to add another WHERE clause it becomes a bit more interesting. Say you want to search for products on discount.

SELECT * FROM products WHERE RAND()<=(100/(SELECT COUNT(*) FROM pt_products)) AND discount<.2 LIMIT 1

What you have to do is make sure you are returning enough result which is why I have it set to 100. Having a WHERE discount<.2 clause in the subquery was 10x slower, so it's better to return more results and limit.

查看更多
冷夜・残月
7楼-- · 2018-12-31 10:07

MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)

This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).

查看更多
登录 后发表回答