Delete Duplicate Rows in SQL

2019-05-13 01:47发布

问题:

I have a table with unique id but duplicate row information.

I can find the rows with duplicates using this query

SELECT
    PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
    Attendance
GROUP BY
    PersonAliasId, StartDateTime, GroupId
HAVING
    COUNT(*) > 1

I can manually delete the rows while keeping the 1 I need with this query

Delete
From Attendance
Where Id IN(SELECT
    Id
FROM
    Attendance
Where PersonAliasId = 15
    and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)

I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.

How can I find the duplicate rows like the first query and delete all but the most recent like the second?

回答1:

You can use a Common Table Expression to delete the duplicates:

WITH Cte AS(
    SELECT *,
        Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId 
                                ORDER BY ModifiedDateTIme DESC)
    FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;

This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId combination.



回答2:

Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.

DELETE a
FROM attendance as a
INNER JOIN (  
   SELECT
        PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
    FROM
        Attendance
    GROUP BY
        PersonAliasId, GroupId
    HAVING
        COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime


回答3:

Same as the CTE answer - give Felix the check

delete 
from ( SELECT rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId 
                                     ORDER BY ModifiedDateTIme DESC)
        FROM Attendance
     ) tt 
where tt.rn > 1