I've got a MySQL table with a bunch of entries in it, and a column called "Multiplier." The default (and most common) value for this column is 0, but it could be any number.
What I need to do is select a single entry from that table at random. However, the rows are weighted according to the number in the "Multiplier" column. A value of 0 means that it's not weighted at all. A value of 1 means that it's weighted twice as much, as if the entry were in the table twice. A value of 2 means that it's weighted three times as much, as if the entry were in the table three times.
I'm trying to modify what my developers have already given me, so sorry if the setup doesn't make a whole lot of sense. I could probably change it but want to keep as much of the existing table setup as possible.
I've been trying to figure out how to do this with SELECT and RAND(), but don't know how to do the weighting. Is it possible?
The result of the pseudo-code
(rand(1, num) % rand(1, num))
will get more toward 0 and less toward num. Subtract the result from num to get the opposite.So if my application language is PHP, it should look something like this:
Explanation of the code above:
It's also achievable merely by using MySQL.
Proving that the pseudo-code
(rand(1, num) % rand(1, num))
will weight toward 0: Execute the following PHP code to see why (in this example, 16 is the highest number):For a much better performance (specially on big tables), first index the weight column and use this query:
Two subqueries are used because MySQL doesn't support LIMIT in the first subquery yet.
On 40MB table the usual query takes 1s on my i7 machine and this one takes 0.04s.
if you want to select N rows...
$rand
pointing.previously selected rows should be excluded on each selection loop.
where ... id not in (3, 5);
For others Googling this subject, I believe you can also do something like this:
The total sum of weights for all records must be n-1, and @RAND should be a random value between 0 and n-1 inclusive.
@RAND could be set in SQL or inserted as a integer value from the calling code.
The subselect will sum up all the preceeding records' weights, checking it it exceeds the random value supplied.
This guy asks the same question. He says the same as Frank, but the weightings don't come out right and in the comments someone suggests using
ORDER BY -LOG(1.0 - RAND()) / Multiplier
, which in my testing gave pretty much perfect results.(If any mathematicians out there want to explain why this is correct, please enlighten me! But it works.)
The disadvantage would be that you couldn't set the weighting to 0 to temporarily disable an option, as you would end up dividing by zero. But you could always filter it out with a
WHERE Multiplier > 0
.Don't use 0, 1 and 2 but 1, 2 and 3. Then you can use this value as a multiplier: