In SQL, is HAVING processed before or after SELECT

2019-04-09 21:41发布

问题:

I was surprised to find that the following works:

    SELECT name, height * weight as inchpounds
    FROM sashelp.class 
    HAVING inchpounds > 5000;

Since I thought the HAVING clause was processed strictly before the SELECT clause. Is this in fact a false assumption on my part? Or is this peculiar to SAS's implementation of SQL?

回答1:

Strictly speaking, and if I remember well, the SQL standard define HAVING clause to be processed before the SELECT clause. So, an alias defined in the SELECT, cannot be used in HAVING clause.

See my answer in a related question: Using 'case expression column' in where clause for the order of proccessing a SELECT statement, which is:

FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY 

So, in most SQL incarnations, that query will yield an error. MySQL is one exception I know that allows this deviation from the standard (and where SELECT aliases are allowed to be used in the GROUP BY clause as well).

As @a_horse_with_no_name mentioned in comments, the query would fail in most SQL products for another reason, too: HAVING without GROUP BY means there is an aggregation over all the rows and thus only aggregate functions would be allowed in both the HAVING and the SELECT clauses.


After looking at the SAS SQL documentation and examples, it seems the query you posted above is valid in SAS SQL and executed as this (in standard SQL):

SELECT name, height * weight AS inchpounds
FROM sashelp.class 
WHERE height * weight > 5000;

If I understand it correctly, you may also mix aggregate functions with non-aggregated columns, like this:

SELECT name, height, 
       MAX(height) AS max_height
FROM sashelp.class ;

which would be evaluated as (in standard SQL):

SELECT name, height, 
       MAX(height) OVER () AS max_height
FROM sashelp.class ;


回答2:

The SQL Standard specifies the order that the clauses are interpreted, not the order that they are executed. As a matter of practice, this means that having clauses are parsed after select, so they can use variables defined in the select statement. SAS SQL varies from other dialects. You may be able to use select variables in the where clause.

SQL engines are free to actually run the query however they like. Your question, though, is not about running the query but about parsing it.

The use of a having clause without a group by is generally suspect. However, I believe it should work in any dialect. The only question is whether 0 rows are returned or 1 summary row.

That your particularly query works is surprising. In most dialects of SQL, you would get an error, because the the having clause would imply an aggregation, and the values in the select are not in aggregation functions. This only other dialect where this would work is MySQL because it has a (mis)feature called Hidden Columns. SAS SQL is a bit farther from standard SQL than other dialects.



回答3:

I cannot speak in terms of the SQL "standard" but with respect to the SAS implementation, the only difference between WHERE and HAVING clauses is "when" they are applied. WHERE clauses apply to the creation of intermediate results (building a temp table containing the columns listed in a SELECT statement) and a HAVING clause controls which rows are written to the final table.

Alternatively, one can use the CALCULATED keyword on a WHERE statement:

SELECT name, height * weight as inchpounds
FROM sashelp.class 
WHERE CALCULATED inchpounds > 5000; 


回答4:

Is this in fact a false assumption on my part?

It can be false. The query planner will decide how to run the query according to the data and the statistics that the database has.

Thinking about the different clauses running in specific sequence is helpful in writing and reasoning about queries but this is not how most SQL databases actually work.



标签: sql sas