GROUP BY having MAX date

2019-01-16 18:17发布

问题:

I have problem when executing this code:

SELECT * FROM tblpm n 
WHERE date_updated=(SELECT MAX(date_updated) 
FROM tblpm GROUP BY control_number 
HAVING control_number=n.control_number)

Basically, I want to return the most recent date for each control number. The query above returns correct output but it takes 37secs. before the output was shown.

Is there any other sql clause or command that can execute faster than the query above?

Thanks in advance.

回答1:

Putting the subquery in the WHERE clause and restricting it to n.control_number means it runs the subquery many times. This is called a correlated subquery, and it's often a performance killer.

It's better to run the subquery once, in the FROM clause, to get the max date per control number.

SELECT n.* 
FROM tblpm n 
INNER JOIN (
  SELECT control_number, MAX(date_updated) AS date_updated
  FROM tblpm GROUP BY control_number
) AS max USING (control_number, date_updated);


回答2:

There's no need to group in that subquery... a where clause would suffice:

SELECT * FROM tblpm n
WHERE date_updated=(SELECT MAX(date_updated)
    FROM tblpm WHERE control_number=n.control_number)

Also, do you have an index on the 'date_updated' column? That would certainly help.



回答3:

Another way that doesn't use group by:

SELECT * FROM tblpm n 
  WHERE date_updated=(SELECT date_updated FROM tblpm n 
                        ORDER BY date_updated desc LIMIT 1)


回答4:

Fast and easy with HAVING:

SELECT * FROM tblpm n 
FROM tblpm GROUP BY control_number 
HAVING date_updated=MAX(date_updated);

In the context of HAVING, MAX finds the max of each group. Only the latest entry in each group will satisfy date_updated=max(date_updated). If there's a tie for latest within a group, both will pass the HAVING filter, but GROUP BY means that only one will appear in the returned table.