How to echo random rows from database?

2020-02-01 15:50发布

I have a database table with about 160 million rows in it.

The table has two columns: id and listing.

I simply need to used PHP to display 1000 random rows from the listing column and put them into <span> tags. Like this:

<span>Row 1</span>
<span>Row 2</span>
<span>Row 3</span>

I've been trying to do it with ORDER BY RAND() but that takes so long to load on such a large database and I haven't been able to find any other solutions.

I'm hoping that there is a fast/easy way to do this. I can't imagine that it'd be impossible to simply echo 1000 random rows... Thanks!

标签: php mysql
5条回答
何必那么认真
2楼-- · 2020-02-01 16:04

Two solutions presented here. Both of these proposed solutions are mysql-only and can be used by any programming language as the consumer. PHP would be wildly too slow for this, but it could be the consumer of it.

Faster Solution: I can bring 1000 random rows from a table of 19 million rows in about 2 tenths of a second with more advanced programming techniques.

Slower Solution: It takes about 15 seconds with non-power programming techniques.

By the way both use the data generation seen HERE that I wrote. So that is my little schema. I use that, continue with TWO more self-inserts seen over there, until I have 19M rows. So I am not going to show that again. But to get those 19M rows, go see that, and do 2 more of those inserts, and you have 19M rows.

Slower version first

First, the slower method.

select id,thing from ratings order by rand() limit 1000;

That returns 1000 rows in 15 seconds.


For anyone new to mysql, don't even read the following.

Faster solution

This is a little more complicated to describe. The gist of it is that you pre-compute your random numbers and generate an in clause ending of random numbers, separated by commas, and wrapped with a pair of parentheses.

It will look like (1,2,3,4) but it will have 1000 numbers in it.

And you store them, and use them once. Like a one time pad for cryptography. Ok, not a great analogy, but you get the point I hope.

Think of it as an ending for an in clause, and stored in a TEXT column (like a blob).

Why in the world would one want to do this? Because RNG (random number generators) are prohibitively slow. But to generate them with a few machines may be able to crank out thousands relatively quickly. By the way (and you will see this in the structure of my so called appendices, I capture how long it takes to generate one row. About 1 second with mysql. But C#, PHP, Java, anything can put that together. The point is not how you put it together, rather, that you have it when you want it.

This strategy, the long and short of it is, when this is combined with fetching a row that has not been used as a random list, marking it as used, and issuing a call such as

select id,thing from ratings where id in (a,b,c,d,e, ... )

and the in clause has 1000 numbers in it, the results are available in less than half a second. Effective employing the mysql CBO (cost based optimizer) than treats it like a join on a PK index.

I leave this in summary form, because it is a bit complicated in practice, but includes the following particles potentially

  • a table holding the precomputed random numbers (Appendix A)
  • a mysql create event strategy (Appendix B)
  • a stored procedure that employees a Prepared Statement (Appendix C)
  • a mysql-only stored proc to demonstrate RNG in clause for kicks (Appendix D)

Appendix A

A table holding the precomputed random numbers

create table randomsToUse
(   -- create a table of 1000 random numbers to use
    -- format will be like a long "(a,b,c,d,e, ...)" string

    -- pre-computed random numbers, fetched upon needed for use

    id int auto_increment primary key,
    used int not null,  -- 0 = not used yet, 1= used
    dtStartCreate datetime not null, -- next two lines to eyeball time spent generating this row
    dtEndCreate datetime not null,
    dtUsed datetime null, -- when was it used
    txtInString text not null -- here is your in clause ending like (a,b,c,d,e, ... )
    -- this may only have about 5000 rows and garbage cleaned
    -- so maybe choose one or two more indexes, such as composites
);

Appendix B

In the interest of not turning this into a book, see my answer HERE for a mechanism for running a recurring mysql Event. It will drive the maintenance of the table seen in Appendix A using techniques seen in Appendix D and other thoughts you want to dream up. Such as re-use of rows, archiving, deleting, whatever.

Appendix C

stored procedure to simply get me 1000 random rows.

DROP PROCEDURE if exists showARandomChunk;
DELIMITER $$
CREATE PROCEDURE showARandomChunk
(
)
BEGIN
  DECLARE i int;
  DECLARE txtInClause text;

  -- select now() into dtBegin;

  select id,txtInString into i,txtInClause from randomsToUse where used=0 order by id limit 1;
  -- select txtInClause as sOut; -- used for debugging

  -- if I run this following statement, it is 19.9 seconds on my Dell laptop
  -- with 19M rows
  -- select * from ratings order by rand() limit 1000; -- 19 seconds

  -- however, if I run the following "Prepared Statement", if takes 2 tenths of a second
  -- for 1000 rows

  set @s1=concat("select * from ratings where id in ",txtInClause);

  PREPARE stmt1 FROM @s1;
  EXECUTE stmt1; -- execute the puppy and give me 1000 rows
  DEALLOCATE PREPARE stmt1;
END
$$
DELIMITER ;

Appendix D

Can be intertwined with Appendix B concept. However you want to do it. But it leaves you with something to see how mysql could do it all by itself on the RNG side of things. By the way, for parameters 1 and 2 being 1000 and 19M respectively, it takes 800 ms on my machine.

This routine could be written in any language as mentioned in the beginning.

drop procedure if exists createARandomInString;
DELIMITER $$
create procedure createARandomInString
(   nHowMany int, -- how many numbers to you want
    nMaxNum int -- max of any one number
)
BEGIN
    DECLARE dtBegin datetime;
    DECLARE dtEnd datetime;
    DECLARE i int;
    DECLARE txtInClause text;
    select now() into dtBegin;

    set i=1;
    set txtInClause="(";
    WHILE i<nHowMany DO
        set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,", "); -- extra space good due to viewing in text editor
        set i=i+1;
    END WHILE;
    set txtInClause=concat(txtInClause,floor(rand()*nMaxNum)+1,")");
    -- select txtInClause as myOutput; -- used for debugging
    select now() into dtEnd;

    -- insert a row, that has not been used yet
    insert randomsToUse(used,dtStartCreate,dtEndCreate,dtUsed,txtInString) values 
       (0,dtBegin,dtEnd,null,txtInClause);
END
$$
DELIMITER ;

How to call the above stored proc:

call createARandomInString(1000,18000000);

That generates and saves 1 row, of 1000 numbers wrapped as described above. Big numbers, 1 to 18M

As a quick illustration, if one were to modify the stored proc, un-rem the line near the bottom that says "used for debugging", and have that as the last line, in the stored proc that runs, and run this:

call createARandomInString(4,18000000);

... to generate 4 random numbers up to 18M, the results might look like

+-------------------------------------+
| myOutput                            |
+-------------------------------------+
| (2857561,5076608,16810360,14821977) |
+-------------------------------------+

Appendix E

Reality check. These are somewhat advanced techniques and I can't tutor anyone on them. But I wanted to share them anyway. But I can't teach it. Over and out.

查看更多
Lonely孤独者°
3楼-- · 2020-02-01 16:19

Please use mysql rand in your query during select statement. Your query will be look like

SELECT * FROM `table` ORDER BY RAND() LIMIT 0,1;
查看更多
叛逆
4楼-- · 2020-02-01 16:25

You want to use the rand function in php. The signature is

rand(min, max);

so, get the number of rows in your table to a $var and set that as your max. A way to do this with SQL is

SELECT COUNT(*) FROM table_name;

then simply run a loop to generate 1000 rands with the above function and use them to get specific rows.

If the IDs are not sequential but if they are close, you can simply test each rand ID to see if there is a hit. If they are far apart, you could pull the entire ID space into php and then randomly sample from that distribution via something like

$random = rand(0, count($rows)-1);

for an array of IDs in $rows.

查看更多
够拽才男人
5楼-- · 2020-02-01 16:26

ORDER BY RAND() is a mysql function working fine with small databases, but if you run anything larger then 10k rows, you should build functions inside your program instead of using mysql premade functions or organise your data in special manners.

My suggestion: keep your mysql data indexed by auto increment id, or add other incremental and unique row.

Then build a select function:

<?php
//get total number of rows
$result = mysql_query('SELECT `id` FROM `table_name`', $link); 
$num_rows = mysql_num_rows($result); 

$randomlySelected = [];

for( $a = 0; $a < 1000; $a ++ ){

        $randomlySelected[$a] = rand(1,$num_rows);

}

//then select data by random ids
$where = "";

$control = 0;
foreach($randomlySelected as $key => $selectedID){

    if($control == 0){

        $where .= "`id` = '". $selectedID ."' ";

    } else {

        $where .= "OR `id` = '". $selectedID ."'";

    }
    $control ++;
}


$final_query = "SELECT * FROM `table_name` WHERE ". $where .";";
$final_results = mysql_query($final_query);    

?>

If some of your incremental IDs out of that 160 million database are missing, then you can easily add a function to add another random IDs (a while loop probably) if an array of randomly selected ids consists of less then required.

Let me know if you need some further help.

查看更多
\"骚年 ilove
6楼-- · 2020-02-01 16:29

If your RAND() function is too slow, and you only need quasi-random records (for a test sample) and not truly random ones, you can always make a fast, effectively-random group by sorting by middle characters (using SUBSTRING) in indexed fields. For example, sorting by the 7th digit of a phone number...in descending order...and then by the 6th digit...in ascending order...that's already quasi-random. You could do the same with character columns: the 6th character in a person's name is going to be meaningless/random, etc.

查看更多
登录 后发表回答