SQL: Remove duplicates

2019-01-27 11:05发布

问题:

How do I remove duplicates from a table that is set up in the following way?

unique_ID | worker_ID | date | type_ID

A worker can have multiple type_ID's associated with them and I want to remove any duplicate types. If there is a duplicate, I want to remove the type with the most recent entry.

回答1:

A textbook candidate for the window function row_number():

;WITH x AS (
    SELECT unique_ID
          ,row_number() OVER (PARTITION BY worker_ID,type_ID ORDER BY date) AS rn
    FROM   tbl
    )
DELETE FROM tbl
FROM   x
WHERE  tbl.unique_ID = x.unique_ID
AND    x.rn > 1

This also takes care of the situation where a set of dupes on (worker_ID,type_ID) shares the same date.
See the simplified demo on data.SE.

Update with simpler version

Turns out, this can be simplified: In SQL Server you can delete from the CTE directly:

;WITH x AS (
    SELECT unique_ID
          ,row_number() OVER (PARTITION BY worker_ID,type_ID ORDER BY date) AS rn
    FROM   tbl
    )
DELETE x
WHERE  rn > 1


回答2:

delete from table t
 where exists ( select 1 from table t2 
                 where t2.worker_id = t.worker_id
                   and t2.type_id = t.type_id
                   and t2.date < t.date )

HTH



回答3:

DELETE FROM @t WHERE unique_Id IN 
(
    SELECT unique_Id FROM 
    (   
        SELECT  unique_Id
                ,Type_Id 
                ,ROW_NUMBER() OVER (PARTITION BY worker_Id, type_Id ORDER BY date) AS rn 
        FROM @t 
    ) Q 
    WHERE rn > 1
)

And to test...

DECLARE @t TABLE
(
    unique_ID  INT IDENTITY,
    worker_ID  INT,
    date  DATETIME,
    type_ID INT
)

INSERT INTO @t VALUES (1, DATEADD(DAY, 1, GETDATE()), 1)
INSERT INTO @t VALUES (1, GETDATE(), 1)
INSERT INTO @t VALUES (2, GETDATE(), 1)
INSERT INTO @t VALUES (1, DATEADD(DAY, 2, GETDATE()), 1)
INSERT INTO @t VALUES (1, DATEADD(DAY, 3, GETDATE()), 2)

SELECT * FROM @t

DELETE FROM @t WHERE unique_Id IN 
(
    SELECT unique_Id FROM 
    (   
        SELECT  unique_Id
                ,Type_Id 
                ,ROW_NUMBER() OVER (PARTITION BY worker_Id, type_Id ORDER BY date) AS rn 
        FROM @t 
    ) Q 
    WHERE rn > 1
)

SELECT * FROM @t



回答4:

you may use this query

delete from worker where unique_id in (
select max(unique_id)  from worker group by  worker_ID , type_ID having count(type_id)>1)

here i am assuming worker as your table name