I often see people answer MySQL questions with queries like this:
SELECT DAY(date), other columns
FROM table
GROUP BY DAY(date);
SELECT somecolumn, COUNT(*)
FROM table
HAVING COUNT(*) > 1;
I always like to give the column an alias and refer to that in the GROUP BY
or HAVING
clause, e.g.
SELECT DAY(date) AS day, other columns
FROM table
GROUP BY day;
SELECT somecolumn, COUNT(*) AS c
FROM table
HAVING c > 1;
Is MySQL smart enough to notice that the expressions in the later clauses are the same as in SELECT
, and only do it once? I'm not sure how to test this -- EXPLAIN
doesn't show any difference, but it doesn't seem to show how it's doing the grouping or filtering in the first place; it seems mainly useful for optimizing joins and WHERE
clauses.
I tend to be pessimistic about MySQL optimization, so I like to give it all the help I can.
I think this can be tested using sleep() function,
for example take a look at this demo: http://sqlfiddle.com/#!2/0bc1b/1
Select * FROM t;
| X |
|---|
| 1 |
| 2 |
| 2 |
SELECT x+sleep(1)
FROM t
GROUP BY x+sleep(1);
SELECT x+sleep(1) As name
FROM t
GROUP BY name;
Execution times of both queries are about 3000 ms ( 3 seconds ).
There are 3 records in the table, and for each record the query sleeps for 1 second only,
so it means that the expression is evaluated only once for each record, not twice.
After consulting with one of the MySQL engineers, I proffer this lengthy answer.
- Caching - no part of a query is 'remembered' for later use in that (or subsequent) query. (Contrast: the Query cache.)
- Common subexpression elimination - no. This is a common Compiler technique, but MySQL does not use it. Example:
(a-b)*(a-b)
will do the subtract twice.
- Removal of a constant from a loop - yes, with limitations. This is another Compiler technique.
- A variety of SQL-centric hacks - yes; see below.
- Re-evaluation of a subquery - it depends. Also, the Optimizer is gradually getting better.
VIEWs
- it depends. There are still cases where a VIEW
is destined to perform worse than the equivalent SELECT
. Example: no condition pushdown into a UNION
in a VIEW
. Actually, this is more a matter of delayed action.
- I think that some newer versions of MariaDB have a "subquery cache".
(Caveat: I do not have 100% confidence in any of my answer, but I do believe that most of it is correct, as of MySQL 5.7, MariaDB 10.1, etc)
Think of a multi-row SELECT
as a loop. Many, maybe all, "deterministic" expressions are evaluated once. Example: Constant date expressions, even involving function calls. But...
NOW()
is specifically evaluated once at the beginning of a query. Furthermore, the value is passed to Slaves when replicating. That is, by the time the query is stored on a slave, NOW()
could be out of date. (SYSDATE()
is another animal.)
Especially with the advent of only_full_group_by
, GROUP BY
needs to know if it matches the SELECT
expressions. So, this looks for similar code.
HAVING
and ORDER BY
can use aliases from the SELECT
list (unlike WHERE
and GROUP BY
). So SELECT expr AS x ... HAVING expr
seems to reevaluate expr
, but SELECT expr AS x ... HAVING x
seems to reach for the already-evaluated expr
.
The Windowing functions of MariaDB 10.2 have some pretty severe restrictions on where they can/cannot be reused; I don't have a complete picture of them yet.
Generally, none of this matters -- the re-evaluation of an expression (DATE(date)
or even COUNT(*)
) will get the same answer. Furthermore, the rummaging through the rows is usually much more costly than expression evaluation. So, unless you have a good stopwatch, you won't tell the difference.