Give priority to ORDER BY over a GROUP BY in MySQL

2019-04-08 00:33发布

I have the following query which does what I want, but I suspect it is possible to do this without a subquery:

  SELECT * 
    FROM (SELECT * 
            FROM 'versions' 
        ORDER BY 'ID' DESC) AS X 
GROUP BY 'program'

What I need is to group by program, but returning the results for the objects in versions with the highest value of "ID".

In my past experience, a query like this should work in MySQL, but for some reason, it's not:

  SELECT * 
    FROM 'versions' 
GROUP BY 'program' 
ORDER BY MAX('ID') DESC

What I want to do is have MySQL do the ORDER BY first and then the GROUP BY, but it insists on doing the GROUP BY first followed by the ORDER BY. i.e. it is sorting the results of the grouping instead of grouping the results of the ordering.

Of course it is not possible to write

SELECT * FROM 'versions' ORDER BY 'ID' DESC GROUP BY 'program'

Thanks.

3条回答
劫难
2楼-- · 2019-04-08 01:01
SELECT  v.*
FROM    (
        SELECT  DISTINCT program
        FROM    versions
        ) vd
JOIN    versions v
ON      v.id = 
        (
        SELECT  vi.id
        FROM    versions vi
        WHERE   vi.program = vd.program
        ORDER BY
                vi.program DESC, vi.id DESC
        LIMIT 1
        )

Create an index on (program, id) for this to work fast.

Regarding your original query:

SELECT * FROM 'versions' GROUP BY 'program' ORDER BY MAX('ID') DESC

This query would not parse in any SQL dialect except MySQL.

It abuses MySQL's ability to return ungrouped and unaggregated expressions from a GROUP BY statement.

查看更多
对你真心纯属浪费
3楼-- · 2019-04-08 01:04

By definition, ORDER BY is processed after grouping with GROUP BY. By definition, the conceptual way any SELECT statement is processed is:

  1. Compute the cartesian product of all tables referenced in the FROM clause
  2. Apply the join criteria from the FROM clause to filter the results
  3. Apply the filter criteria in the WHERE clause to further filter the results
  4. Group the results into subsets based on the GROUP BY clause, collapsing the results to a single row for each such subset and computing the values of any aggregate functions -- SUM(), MAX(), AVG(), etc. -- for each such subset. Note that if no GROUP BY clause is specified, the results are treated as if there is a single subset and any aggregate functions apply to the entire results set, collapsing it to a single row.
  5. Filter the now-grouped results based on the HAVING clause.
  6. Sort the results based on the ORDER BY clause.

The only columns allowed in the results set of a SELECT with a GROUP BY clause are, of course,

  • The columns referenced in the GROUP BY clause
  • Aggregate functions (such as MAX())
  • literal/constants
  • expresssions derived from any of the above.

Only broken SQL implementations allow things like select xxx,yyy,a,b,c FROM foo GROUP BY xxx,yyy — the references to colulmsn a, b and c are meaningless/undefined, given that the individual groups have been collapsed to a single row,

查看更多
虎瘦雄心在
4楼-- · 2019-04-08 01:08

This should do it and work pretty well as long as there is a composite index on (program,id). The subquery should only inspect the very first id for each program branch, and quickly retrieve the required record from the outer query.

select v.*
from
(
    select program, MAX(id) id
    from versions
    group by program
) m
inner join versions v on m.program=v.program and m.id=v.id
查看更多
登录 后发表回答