optimizing mysql order by rand()

2019-08-31 06:22发布

I can't find a good answer to my problem.

I have a mysql query with an inner join and an order by rand() and a limit X. When I remove the order by rand() the query is 10 times faster. Is there a more efficient way to get a random subset of 500 rows? Heres a sample query.

Select * from table1 
inner join table2 on table1.in = table2.in
where table1.T = A
order by rand()
limit 500;

标签: mysql limit
2条回答
家丑人穷心不美
2楼-- · 2019-08-31 07:07

This should help:

Select *
from table1 inner join
     table2
     on table1.in = table2.in
where table1.T = A and rand() < 1000.0/20000.0
order by rand()
limit 500

This will limit the result set to about 1000 random rows before extracting a random sample of 500. The purpose of getting more rows than expected is just to be sure that you get a large enough sample size.

Here is an alternative strategy, building off the "create your own indexes" approach.

Create a temporary table using the following query:

create temporary table results as
(Select *, @rn := @rn + 1 as rn
from table1 inner join
     table2
     on table1.in = table2.in cross join
     (select @rn := 0) const
where table1.T = A
);

You now have a row number column. And, you can return the number of rows with:

select @rn;

Then you can generate the ids in your application.

I would be inclined to keep the processing in the database, using these two queries:

create temporary table results as
(Select *, @rn := @rn + 1 as rn, rand() as therand
from table1 inner join
     table2
     on table1.in = table2.in cross join
     (select @rn := 0) const
where table1.T = A
);

select *
from results
where therand < 1000/@rn
order by therand
limit 500;
查看更多
相关推荐>>
3楼-- · 2019-08-31 07:19

A good way is to do it in application level in 2 steps:

  1. Get the row count of your dataset and "extract" 2 random numbers between 0 and your count.
  2. Use these number from (1) as offset to your LIMIT

Try it and measure if performance is acceptable for you.

查看更多
登录 后发表回答