Using `rand()` with `having`

2020-08-19 05:37发布

问题:

I have a table that contains a list of records. Each iteration, a random set of these must be picked, starting from a specific offset. Each row has a chance to be picked (so e.g. new or not often picked rows are picked more).

However, something doesn't work, causing rows to be returned that do not satisfy a condition using an aliased rand().

I'm attempting to use the following query:

select
    id,
    probability,
    rand() rolledChance
from records
where id > :offset
having rolledChance < probability;

Where :offset is a prepared statement parameter, and is the last scanned id in the last iteration for this user.

On a table created like this (which is the relevant subset of the table):

CREATE TABLE records (id INT, probability FLOAT);

Where probability is a value between 0 and 1 on the table records. However, this returns rows where the condition does not satisfy. I checked this with the following query:

select
    *,
    x.rolledChance < x.probability shouldPick
from
    (select
        id,
        probability,
        rand() rolledChance
    from records
    having rolledChance < probability
) x;

A few rows returned are:

id      probability     rolledChance            shouldPick
12      0.546358        0.015139976530466207    1
26      0.877424        0.9730734508233829      0
46      0.954425        0.35213605347288407     1

When I repurpose the second query as follows, it works as expected, and only returns rows where rolledChance is actually lower than probability:

select
    *,
    x.rolledChance < x.probability shouldPick
from
    (select id, probability, rand() rolledChance from records) x
where rolledChance < probability;

So what am I missing? Are the probability and rolledChance used differently than I thought in the comparison? Is the rand() evaluated every time the alias is used in the same query?

Version output: mysql Ver 15.1 Distrib 10.0.28-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2, running on Debian Jessie.

回答1:

I think the problem is that HAVING is applied after GROUP BY, but still before the SELECT phase. I realise it's confusing because the HAVING clause references a column from the SELECT statement, but I think it basically just executes whatever is in the SELECT statement twice - once for the having, and then again for the SELECT.

Eg, see this answer.

Note, it's especially confusing because if you refer to a column name that doesn't appear in the SELECT statement in a HAVING clause it'll throw an error.

Eg, this fiddle

But as per that fiddle above, it'll still let you actually filter based on the result of a function that doesn't appear in the output. Long story short, the HAVING clause is still doing what you want, but you can't both filter on a random value and display it at the same time using that approach. If you need to do that, you need to use a subquery to fix the value first, then the outer query can filter and display on it.

Also, to make it clear, it's probably worth just using RAND() in the having clause, not the SQL part. Though I get that this question is asking why it's doing this rather than trying to solve the problem specifically.



标签: mysql mariadb