Optimizing slow ORDER BY RAND() query

2019-02-16 00:38发布

I have a query that is using ORDER BY RAND() but it takes too long and it's getting worse as data is growing.

The query joins two tables and it returns 5 random products and a random image of each product

Table 1 - Products

product_id - pk auto-inc
name 
description

Data

1 - product 1 - description
2 - product 2 - description

Table 2 - ProductImages

image_id   - pk auto-inc
product_id - fk index
filename

Data

1 - 1 - product 1 image
2 - 1 - product 1 image
3 - 1 - product 1 image
4 - 2 - product 2 image

...

I've read this and this but cannot find a way to optimize the query so i'm asking for help. Thanks in advance.

标签: mysql random
1条回答
仙女界的扛把子
2楼-- · 2019-02-16 01:09

ORDER BY RAND() is slow because the DBMS has to read all rows, sort them all, just to keep only a few rows. So the performance of this query heavily depends on the number of rows in the table, and decreases as the number of rows increase.

There is no way to optimize that.

There are alternatives, however:

You can implement "get 5 random rows" by doing 6 queries:

  • get number of rows in table (you can cache this one)
  • do 5 queries with OFFSET <random offset from 0 to $number_of_rows-1> LIMIT 1 (i.e. read and return only one row from some random offset)

    For example: SELECT * FROM Products OFFSET 42 LIMIT 1 (note: without joining, for now)

    Such queries are very fast and run in a time virtually independent from the table size.

This should be much faster than ORDER BY RAND().


Now, to get a random Image for each random Product:

SELECT *
FROM (
    SELECT *
    FROM Products
    OFFSET 42 LIMIT 1
) p
JOIN ProductImages pi
ON   pi.product_id = p.id
ORDER BY RAND()
LIMIT 1

The inner query is still fast, and the outer is only sorting few rows (assuming there are few images per product), and so can still use order by rand().

查看更多
登录 后发表回答