Querying for a unique value based on the aggregate

2019-05-26 13:05发布

问题:

So I know this problem isn't a new one, but I'm trying to wrap my head around it and understand the best way to deal with scenarios like this.

Say I have a hypothetical table 'X' that looks like this:

GroupID ID (identity)   SomeDateTime
--------------------------------------------
1       1000        1/1/01
1       1001        2/2/02
1       1002        3/3/03
2       1003        4/4/04
2       1004        5/5/05

I want to query it so the result set looks like this:

----------------------------------------
1       1002        3/3/03
2       1004        5/5/05

Basically what I want is the MAX SomeDateTime value grouped by my GroupID column. The kicker is that I DON'T want to group by the ID column, I just want to know the 'ID' that corresponds to the MAX SomeDateTime.

I know one pseudo-solution would be:

;WITH X1 as (
    SELECT MAX(SomeDateTime) as SomeDateTime, GroupID 
    FROM X
    GROUP BY GroupID
)
SELECT X1.SomeDateTime, X1.GroupID, X2.ID
FROM X1
    INNER JOIN X as X2
        ON X.DateTime = X2.DateTime

But this doesn't solve the fact that a DateTime might not be unique. And it seems sloppy to join on a DateTime like that.

Another pseudo-solution could be:

SELECT X.GroupID, MAX(X.ID) as ID, MAX(X.SomeDateTime) as SomeDateTime
FROM X
GROUP BY X.GroupID

But there are no guarantees that ID will actually match the row that SomeDateTime comes from.

A third less useful option might be:

SELECT TOP 1 X.GroupID, X.ID, X.SomeDateTime
FROM X
WHERE X.GroupID = 1
ORDER BY X.SomeDateTime DESC

But obviously that only works with a single, known, GroupID. I want to be able to join this result set on GroupID and/or ID.

Does anyone know of any clever solutions? Any good uses of windowing functions?

Thanks!

回答1:

I think this will do what you want.

;WITH X1 
AS 
(
    SELECT SomeDateTime
           ,GroupID 
           ,ID
           ,ROW_NUMBER() OVER (PARTITION BY GroupID
                               ORDER BY SomeDateTime DESC
                               ) AS rn
    FROM X
)
SELECT SomeDateTime
       ,GroupID
       ,ID
FROM X1
WHERE rn = 1


回答2:

Your first solution is the correct one. I'd have written is as an inner select (note, syntax in innnersleect is ifferent by dbms and I'm used to db2 these days, so that's what you get :-):

Select g.groupid, g.id, g.somedatetime

from x g, (select s.groupid, max(s.somedatetime) from x s group by s.groupid) si

where g.groupid=s.groupid and g.somedatetime=si.somedatetime;

But, as you noted if somedatetime isn't unique for an id, then you 'll get multiple rows for each such group id. to eliminate that, you'll need an additional sub select to get the max(id) for the max(somedatetime). Very messy:

Select g.groupid, g.id, g.somedatetime

from x g, (select s.groupid, max(s.somedatetime) as maxdate from x s group by s.groupid) si,

(select i.groupid, max(i.id) as maxid, i.somedatetime from x i where i.groupid=si.groupid and i.somedatetime=si.maxdate group by i.groupid, i.somedatetime) si2

where g.groupid=s.groupid and g.id=maxid and g.somedatetime=maxdate;



回答3:

You can also use a combination of the cross apply function and the Top(1) to solve this

select
 distinct(GroupId)
 ,maxvales.SomeDateTime
from
 X as outerX
 cross apply
  (select top(1) SomeDateTime from X as innerX where innerX.GroupID = outerX.GroupID order by SomeDateTime desc) as maxvales