I have a fairly large dataset and a query that requires two joins, so efficiency of the query is very important to me. I need to retrieve 3 random rows from the database that meet a condition based on the result of a join. Most obvious solution is pointed out as inefficient here, because
[these solutions] need a sequential scan of all the table (because the random value associated with each row needs to be calculated - so that the smallest one can be determined), which can be quite slow for even medium sized tables.
However, the method suggested by the author there (SELECT * FROM table WHERE num_value >= RAND() * (SELECT MAX(num_value) FROM table) LIMIT 1
where num_value is ID) doesn't work for me because some IDs might be missing (because some rows may have been been deleted by users).
So, what would be the most efficient way to retrieve 3 random rows in my situation?
EDIT: the solution does not need to be pure SQL. I also use PHP.
Adding your RAND() call into the ORDER BY clause should allow you to ignore the ID. Try this:
After having performance issues pointed out, your best bet may be something along these lines (utilizing PHP):
At this point you should be able to select the first 3 results. The only issue with this approach is dealing with deleted rows and you might have to either bump the $rows var or add some logic to do another query in case you didn't receive at least 3 results back.
Since you don't want many results, there are a couple of interesting options using
LIMIT
andOFFSET
.I'm going to assume an
id
column which is unique and suitable for sorting.The first step is to execute a
COUNT(id)
, and then select random 3 numbers from0
toCOUNT(id) - 1
in PHP. (How to do that is a separate question, and the best approach depends on the number of rows total and the number you want).The second step has two options. Suppose the random numbers you selected are 0, 15, 2234. Either have a loop in PHP
or build a
UNION
. Note: this requires sub-selects because we're using ORDER BY.