I was surprised to find that the following works:
SELECT name, height * weight as inchpounds
FROM sashelp.class
HAVING inchpounds > 5000;
Since I thought the HAVING clause was processed strictly before the SELECT clause. Is this in fact a false assumption on my part? Or is this peculiar to SAS's implementation of SQL?
Strictly speaking, and if I remember well, the SQL standard define HAVING
clause to be processed before the SELECT
clause. So, an alias defined in the SELECT, cannot be used in HAVING
clause.
See my answer in a related question: Using 'case expression column' in where clause for the order of proccessing a SELECT
statement, which is:
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY
So, in most SQL incarnations, that query will yield an error. MySQL is one exception I know that allows this deviation from the standard (and where SELECT
aliases are allowed to be used in the GROUP BY
clause as well).
As @a_horse_with_no_name mentioned in comments, the query would fail in most SQL products for another reason, too: HAVING
without GROUP BY
means there is an aggregation over all the rows and thus only aggregate functions would be allowed in both the HAVING
and the SELECT
clauses.
After looking at the SAS SQL documentation and examples, it seems the query you posted above is valid in SAS SQL and executed as this (in standard SQL):
SELECT name, height * weight AS inchpounds
FROM sashelp.class
WHERE height * weight > 5000;
If I understand it correctly, you may also mix aggregate functions with non-aggregated columns, like this:
SELECT name, height,
MAX(height) AS max_height
FROM sashelp.class ;
which would be evaluated as (in standard SQL):
SELECT name, height,
MAX(height) OVER () AS max_height
FROM sashelp.class ;
The SQL Standard specifies the order that the clauses are interpreted, not the order that they are executed. As a matter of practice, this means that having
clauses are parsed after select
, so they can use variables defined in the select
statement. SAS SQL varies from other dialects. You may be able to use select
variables in the where
clause.
SQL engines are free to actually run the query however they like. Your question, though, is not about running the query but about parsing it.
The use of a having
clause without a group by
is generally suspect. However, I believe it should work in any dialect. The only question is whether 0 rows are returned or 1 summary row.
That your particularly query works is surprising. In most dialects of SQL, you would get an error, because the the having
clause would imply an aggregation, and the values in the select
are not in aggregation functions. This only other dialect where this would work is MySQL because it has a (mis)feature called Hidden Columns. SAS SQL is a bit farther from standard SQL than other dialects.
I cannot speak in terms of the SQL "standard" but with respect to the SAS implementation, the only difference between WHERE
and HAVING
clauses is "when" they are applied. WHERE
clauses apply to the creation of intermediate results (building a temp table containing the columns listed in a SELECT
statement) and a HAVING
clause controls which rows are written to the final table.
Alternatively, one can use the CALCULATED
keyword on a WHERE
statement:
SELECT name, height * weight as inchpounds
FROM sashelp.class
WHERE CALCULATED inchpounds > 5000;
Is this in fact a false assumption on my part?
It can be false. The query planner will decide how to run the query according to the data and the statistics that the database has.
Thinking about the different clauses running in specific sequence is helpful in writing and reasoning about queries but this is not how most SQL databases actually work.