I am doing a subquery in which I have a calculated column involving random number generation. In the base query I select this column twice. MySQL 5.6 works as I expect, the calculated value being called once and fixed. The 5.7+/8.0+ execution seems to re-evaluate the subquery's column value individually for each selection. Is this correct behavior? What can I do to force it work as expected in newer versions of MySQL?
CREATE TABLE t (
`id` BIGINT(20) NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE=InnoDB;
insert into t values();
insert into t values();
insert into t values();
insert into t values();
insert into t values();
SELECT
q.i,
q.r,
q.r
FROM (
SELECT
id AS i,
(FLOOR(RAND(100) * 4)) AS r
FROM t
) q;
MySQL 5.6 yields (values are the same):
+---+-----+-----+
| i | r | r |
+---+-----+-----+
| 1 | 0 | 0 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 2 | 2 |
| 5 | 1 | 1 |
+---+-----+-----+
while 5.7 yields (values are different):
+---+-----+-----+
| i | r | r |
+---+-----+-----+
| 1 | 0 | 2 |
| 2 | 3 | 2 |
| 3 | 1 | 1 |
| 4 | 2 | 1 |
| 5 | 2 | 0 |
+---+-----+-----+
As mentioned in The MySQL 8.0.0 Milestone Release is available,
In MySQL 5.6 and earlier, derived tables were always materialized. In
5.7, derived tables are merged into the outer query in most cases, and materialized in some cases.
...
Enabling merging a derived table or view through a optimizer hint (WL#9307) — This work by Guilhem Bichot allows users to control whether a derived table or view will be merged or materialized using the “merge” and “no_merge” hints.
I suppose that this is the cause of the behavior I am observing in newer versions of MySQL.
The mentioned hint can be used with MySQL 8.0 to force RAND() be called only once:
SELECT /* NO_MERGE(q) */
q.i,
q.r,
q.r
FROM (
SELECT
id AS i,
(FLOOR(RAND(100) * 4)) AS r
FROM t
) AS q;
+---+-----+-----+
| i | r | r |
+---+-----+-----+
| 1 | 0 | 0 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 2 | 2 |
| 5 | 1 | 1 |
+---+-----+-----+
This however is not available in 5.7. To achieve the desired behavior with 5.7, add LIMIT <a very high number>
to the derived table definition (I'm using signed LONG_MAX below). Thanks to Roy Lyseng for this workaround.
SELECT
q.i,
q.r,
q.r
FROM (
SELECT
id AS i,
(FLOOR(RAND(100) * 4)) AS r
FROM t LIMIT 9223372036854775807
) AS q;
+---+-----+-----+
| i | r | r |
+---+-----+-----+
| 1 | 0 | 0 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 2 | 2 |
| 5 | 1 | 1 |
+---+-----+-----+
As philipxy mentioned in the comment, the result of a query expression must be strictly defined regardless of any optimizations being applied. Which means it is an optimizer bug in MySQL 5.7/8.0.