I have a database table with about 160 million rows in it.
The table has two columns: id
and listing
.
I simply need to used PHP to display 1000 random rows from the listing
column and put them into <span>
tags. Like this:
<span>Row 1</span>
<span>Row 2</span>
<span>Row 3</span>
I've been trying to do it with ORDER BY RAND()
but that takes so long to load on such a large database and I haven't been able to find any other solutions.
I'm hoping that there is a fast/easy way to do this. I can't imagine that it'd be impossible to simply echo 1000 random rows... Thanks!
Two solutions presented here. Both of these proposed solutions are mysql-only and can be used by any programming language as the consumer. PHP would be wildly too slow for this, but it could be the consumer of it.
Faster Solution: I can bring 1000 random rows from a table of 19 million rows in about 2 tenths of a second with more advanced programming techniques.
Slower Solution: It takes about 15 seconds with non-power programming techniques.
By the way both use the data generation seen HERE that I wrote. So that is my little schema. I use that, continue with TWO more self-inserts seen over there, until I have 19M rows. So I am not going to show that again. But to get those 19M rows, go see that, and do 2 more of those inserts, and you have 19M rows.
Slower version first
First, the slower method.
That returns 1000 rows in 15 seconds.
Faster solution
This is a little more complicated to describe. The gist of it is that you pre-compute your random numbers and generate an
in clause
ending of random numbers, separated by commas, and wrapped with a pair of parentheses.It will look like
(1,2,3,4)
but it will have 1000 numbers in it.And you store them, and use them once. Like a one time pad for cryptography. Ok, not a great analogy, but you get the point I hope.
Think of it as an ending for an
in
clause, and stored in a TEXT column (like a blob).Why in the world would one want to do this? Because RNG (random number generators) are prohibitively slow. But to generate them with a few machines may be able to crank out thousands relatively quickly. By the way (and you will see this in the structure of my so called appendices, I capture how long it takes to generate one row. About 1 second with mysql. But C#, PHP, Java, anything can put that together. The point is not how you put it together, rather, that you have it when you want it.
This strategy, the long and short of it is, when this is combined with fetching a row that has not been used as a random list, marking it as used, and issuing a call such as
and the in clause has 1000 numbers in it, the results are available in less than half a second. Effective employing the mysql CBO (cost based optimizer) than treats it like a join on a PK index.
I leave this in summary form, because it is a bit complicated in practice, but includes the following particles potentially
in
clause for kicks (Appendix D)Appendix A
A table holding the precomputed random numbers
Appendix B
In the interest of not turning this into a book, see my answer HERE for a mechanism for running a recurring mysql Event. It will drive the maintenance of the table seen in Appendix A using techniques seen in Appendix D and other thoughts you want to dream up. Such as re-use of rows, archiving, deleting, whatever.
Appendix C
stored procedure to simply get me 1000 random rows.
Appendix D
Can be intertwined with Appendix B concept. However you want to do it. But it leaves you with something to see how mysql could do it all by itself on the RNG side of things. By the way, for parameters 1 and 2 being 1000 and 19M respectively, it takes 800 ms on my machine.
This routine could be written in any language as mentioned in the beginning.
How to call the above stored proc:
That generates and saves 1 row, of 1000 numbers wrapped as described above. Big numbers, 1 to 18M
As a quick illustration, if one were to modify the stored proc, un-rem the line near the bottom that says "used for debugging", and have that as the last line, in the stored proc that runs, and run this:
... to generate 4 random numbers up to 18M, the results might look like
Appendix E
Reality check. These are somewhat advanced techniques and I can't tutor anyone on them. But I wanted to share them anyway. But I can't teach it. Over and out.
Please use mysql rand in your query during select statement. Your query will be look like
You want to use the
rand
function in php. The signature isso, get the number of rows in your table to a $var and set that as your
max
. A way to do this with SQL isthen simply run a loop to generate 1000 rands with the above function and use them to get specific rows.
If the IDs are not sequential but if they are close, you can simply test each rand ID to see if there is a hit. If they are far apart, you could pull the entire ID space into php and then randomly sample from that distribution via something like
for an array of IDs in
$rows
.ORDER BY RAND() is a mysql function working fine with small databases, but if you run anything larger then 10k rows, you should build functions inside your program instead of using mysql premade functions or organise your data in special manners.
My suggestion: keep your mysql data indexed by auto increment
id
, or add other incremental and unique row.Then build a select function:
If some of your incremental IDs out of that 160 million database are missing, then you can easily add a function to add another random IDs (a while loop probably) if an array of randomly selected ids consists of less then required.
Let me know if you need some further help.
If your RAND() function is too slow, and you only need quasi-random records (for a test sample) and not truly random ones, you can always make a fast, effectively-random group by sorting by middle characters (using SUBSTRING) in indexed fields. For example, sorting by the 7th digit of a phone number...in descending order...and then by the 6th digit...in ascending order...that's already quasi-random. You could do the same with character columns: the 6th character in a person's name is going to be meaningless/random, etc.