Select the top 1 row from each group

2020-06-11 20:12发布

问题:

I have a table that lists the versions of software that are installed:

id  | userid | version | datetime
----+--------+---------+------------------------
111 | 75     | 10075   | 2013-03-12 13:40:58.770
112 | 75     | 10079   | 2013-03-12 13:41:01.583
113 | 78     | 10065   | 2013-03-12 14:18:24.463
114 | 78     | 10079   | 2013-03-12 14:22:20.437
115 | 78     | 10079   | 2013-03-12 14:24:01.830
116 | 78     | 10080   | 2013-03-12 14:24:06.893
117 | 74     | 10080   | 2013-03-12 15:31:42.797
118 | 75     | 10079   | 2013-03-13 07:03:56.157
119 | 75     | 10080   | 2013-03-13 07:05:23.137
120 | 65     | 10080   | 2013-03-13 07:24:33.323
121 | 68     | 10080   | 2013-03-13 08:03:24.247
122 | 71     | 10080   | 2013-03-13 08:20:16.173
123 | 78     | 10080   | 2013-03-13 08:28:25.487
124 | 56     | 10080   | 2013-03-13 08:49:44.503

I would like to display all fields of one record from each userid but only the highest version (also version is a varchar).

回答1:

You're not specifying how you want ties handled, but this will do it if you want the duplicates displayed;

SELECT a.* FROM MyTable a
LEFT JOIN MyTable b
  ON a.userid=b.userid
 AND CAST(a.version AS INT) < CAST(b.version AS INT)
WHERE b.version IS NULL

An SQLfiddle to test with.

If you want to eliminate duplicates and if they exist pick the newest of them, you'll have to extend the query somewhat;

WITH cte AS (SELECT *, CAST(version AS INT) num_version FROM MyTable)
SELECT a.id, a.userid, a.version, a.datetime 
FROM cte a LEFT JOIN cte b
  ON a.userid=b.userid
 AND (a.num_version < b.num_version OR 
     (a.num_version = b.num_version AND a.[datetime]<b.[datetime]))
WHERE b.version IS NULL

Another SQLfiddle.



回答2:

If you use SQL-Server (minimum 2005) you can use a CTE with the ROW_NUMBER function. You can use CAST for version to get the correct order:

WITH cte 
     AS (SELECT id, 
                userid, 
                version, 
                datetime, 
                Row_number() 
                  OVER ( 
                    partition BY userid 
                    ORDER BY Cast(version AS INT) DESC) rn 
         FROM   [dbo].[table]) 
SELECT id, 
       userid, 
       version, 
       datetime 
FROM   cte 
WHERE  rn = 1 
ORDER BY userid

Demo

ROW_NUMBER returns always one record even if there are multiple users with the same (top) version. If you want to return all "top-version-user-records", you have to replace ROW_NUMBER with DENSE_RANK.



回答3:

WITH records
AS
(
    SELECT  id, userid, version, datetime,
            ROW_NUMBER() OVER (PARTITION BY userID
                                ORDER BY version DESC) rn
    FROM    tableName
)
SELECT id, userid, version, datetime
FROM    records
WHERE   RN =1 
  • SQLFiddle Demo


回答4:

I think this may solve your problem :

 SELECT id,
       userid,
       Version,
       datetime FROM (
           SELECT id,
                  userid,
                  Version,
                  datetime , 
                  DENSE_Rank() over (Partition BY id order by datetime asc) AS Rankk
           FROM [dbo].[table]) RS 
WHERE Rankk<2

I used RANK function for ur requirement....



回答5:

select l.* from the_table l
left outer join the_table r
on l.userid = r.userid and l.version < r.version
where r.version is null


回答6:

The following code will display what you want and is great for performance!

select * from the_table t where cast([version] as int) = 
(select max(cast([version] as int)) from the_table where userid = t.userid)


回答7:

If my experience tuning has taught me anything, generalities are bad bad bad.

BUT, If the table your getting the Top X from is large (i.e. hundreds of thousands or millions). CROSS APPLY is almost universally the best. In fact, if you benchmark it, cross apply performs consistently & admirably at smaller scales as well (in the tens of thousands) And ever covers the with ties potential requirement.

Something like:

select
    id
    ,userid
    ,version
    ,datetime
from
    TheTable t
cross apply
(
    select top 1 --with ties
        id
    from
        TheTable
    where
        userid = t.userid
    order by
        datetime desc
)