I have a query that is using ORDER BY RAND()
but it takes too long and it's getting worse as data is growing.
The query joins two tables and it returns 5 random products and a random image of each product
Table 1 - Products
product_id - pk auto-inc
name
description
Data
1 - product 1 - description
2 - product 2 - description
Table 2 - ProductImages
image_id - pk auto-inc
product_id - fk index
filename
Data
1 - 1 - product 1 image
2 - 1 - product 1 image
3 - 1 - product 1 image
4 - 2 - product 2 image
...
I've read this and this but cannot find a way to optimize the query so i'm asking for help.
Thanks in advance.
ORDER BY RAND()
is slow because the DBMS has to read all rows, sort them all, just to keep only a few rows. So the performance of this query heavily depends on the number of rows in the table, and decreases as the number of rows increase.
There is no way to optimize that.
There are alternatives, however:
You can implement "get 5 random rows" by doing 6 queries:
- get number of rows in table (you can cache this one)
do 5 queries with OFFSET <random offset from 0 to $number_of_rows-1> LIMIT 1
(i.e. read and return only one row from some random offset)
For example: SELECT * FROM Products OFFSET 42 LIMIT 1
(note: without joining, for now)
Such queries are very fast and run in a time virtually independent from the table size.
This should be much faster than ORDER BY RAND()
.
Now, to get a random Image for each random Product:
SELECT *
FROM (
SELECT *
FROM Products
OFFSET 42 LIMIT 1
) p
JOIN ProductImages pi
ON pi.product_id = p.id
ORDER BY RAND()
LIMIT 1
The inner query is still fast, and the outer is only sorting few rows (assuming there are few images per product), and so can still use order by rand().