I have a table with unique id but duplicate row information.
I can find the rows with duplicates using this query
SELECT
PersonAliasId, StartDateTime, GroupId, COUNT(*) as Count
FROM
Attendance
GROUP BY
PersonAliasId, StartDateTime, GroupId
HAVING
COUNT(*) > 1
I can manually delete the rows while keeping the 1 I need with this query
Delete
From Attendance
Where Id IN(SELECT
Id
FROM
Attendance
Where PersonAliasId = 15
and StartDateTime = '9/24/2017'
and GroupId = 1429
Order By ModifiedDateTIme Desc
Offset 1 Rows)
I am not versed in SQL enough to figure out how to use the rows in the first query to delete the duplicates leaving behind the most recent. There are over 3481 records returned by the first query to do this one by one manually.
How can I find the duplicate rows like the first query and delete all but the most recent like the second?
You can use a Common Table Expression
to delete the duplicates:
WITH Cte AS(
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
)
DELETE FROM Cte WHERE Rn > 1;
This will keep the most recent record for each PersonAliasId - StartDateTime - GroupId
combination.
Use the MAX aggregate function to identify the latest startdatetime for each group/person combination. Then delete records which do not have that latest time.
DELETE a
FROM attendance as a
INNER JOIN (
SELECT
PersonAliasId, MAX(StartDateTime) AS LatestTime, GroupId,
FROM
Attendance
GROUP BY
PersonAliasId, GroupId
HAVING
COUNT(*) > 1
) as b
on a.personaliasid=b.personaliasid and a.groupid=b.groupid and a.startdatetime < b.latesttime
Same as the CTE answer - give Felix the check
delete
from ( SELECT rn = ROW_NUMBER() OVER(PARTITION BY PersonAliasId, StartDateTime, GroupId
ORDER BY ModifiedDateTIme DESC)
FROM Attendance
) tt
where tt.rn > 1