quick selection of a random row from a large table

2018-12-31 09:24发布

What is a fast way to select a random row from a large mysql table?

I'm working in php, but I'm interested in any solution even if it's in another language.

24条回答
零度萤火
2楼-- · 2018-12-31 09:46

For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:

SELECT * FROM
`words` AS r1 JOIN 
(SELECT  MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n
查看更多
唯独是你
3楼-- · 2018-12-31 09:49

Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM() causes.

Update: You still need to calculate some random value prior to issuing the SELECT statement upon retrieval, of course, e.g.

SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1
查看更多
有味是清欢
4楼-- · 2018-12-31 09:51

I'm a bit new to SQL but how about generating a random number in PHP and using

SELECT * FROM the_table WHERE primary_key >= $randNr

this doesn't solve the problem with holes in the table.

But here's a twist on lassevks suggestion:

SELECT primary_key FROM the_table

Use mysql_num_rows() in PHP create a random number based on the above result:

SELECT * FROM the_table WHERE primary_key = rand_number

On a side note just how slow is SELECT * FROM the_table:
Creating a random number based on mysql_num_rows() and then moving the data pointer to that point mysql_data_seek(). Just how slow will this be on large tables with say a million rows?

查看更多
永恒的永恒
5楼-- · 2018-12-31 09:52

Take a look at this link by Jan Kneschke or this SO answer as they both discuss the same question. The SO answer goes over various options also and has some good suggestions depending on your needs. Jan goes over all the various options and the performance characteristics of each. He ends up with the following for the most optimized method by which to do this within a MySQL select:

SELECT name
  FROM random AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1;

HTH,

-Dipin

查看更多
旧人旧事旧时光
6楼-- · 2018-12-31 09:52

SELECT DISTINCT * FROM yourTable WHERE 4 = 4 LIMIT 1;

查看更多
路过你的时光
7楼-- · 2018-12-31 09:54

Quick and dirty method:

SET @COUNTER=SELECT COUNT(*) FROM your_table;

SELECT PrimaryKey
FROM your_table
LIMIT 1 OFFSET (RAND() * @COUNTER);

The complexity of the first query is O(1) for MyISAM tables.

The second query accompanies a table full scan. Complexity = O(n)

Dirty and quick method:

Keep a separate table for this purpose only. You should also insert the same rows to this table whenever inserting to the original table. Assumption: No DELETEs.

CREATE TABLE Aux(
  MyPK INT AUTO_INCREMENT,
  PrimaryKey INT
);

SET @MaxPK = (SELECT MAX(MyPK) FROM Aux);
SET @RandPK = CAST(RANDOM() * @MaxPK, INT)
SET @PrimaryKey = (SELECT PrimaryKey FROM Aux WHERE MyPK = @RandPK);

If DELETEs are allowed,

SET @delta = CAST(@RandPK/10, INT);

SET @PrimaryKey = (SELECT PrimaryKey
                   FROM Aux
                   WHERE MyPK BETWEEN @RandPK - @delta AND @RandPK + @delta
                   LIMIT 1);

The overall complexity is O(1).

查看更多
登录 后发表回答