Delete multiple duplicate rows in table

2020-07-17 05:49发布

I have multiple groups of duplicates in one table (3 records for one, 2 for another, etc) - multiple rows where more than 1 exists.

Below is what I came up with to delete them, but I have to run the script for however many duplicates there are:

set rowcount 1
delete from Table
where code in (
  select code from Table 
  group by code
  having (count(code) > 1)
)
set rowcount 0

This works well to a degree. I need to run this for every group of duplicates, and then it only deletes 1 (which is all I need right now).

4条回答
混吃等死
2楼-- · 2020-07-17 06:32
SET ROWCOUNT 1    
DELETE Table    
FROM Table a    
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
WHILE @@rowcount > 0    
  DELETE Table    
  FROM Table a    
  WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
SET ROWCOUNT 0

this will delete all duplicate rows, But you can add attributes if you want to compare according to them .

查看更多
Emotional °昔
3楼-- · 2020-07-17 06:34

It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.

I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)

create table SrcTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')

by John Sansom in previous answer

-- cloning "unique" part
SELECT * INTO TempTable 
FROM SrcTable --original table
WHERE id IN  
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;

DROP TABLE SrcTable
GO;

sys.sp_rename 'TempTable', 'SrcTable'
查看更多
劫难
4楼-- · 2020-07-17 06:39

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.

Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....

create table #TempTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')

select * from #TempTable

--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Final Result Set
select * from #TempTable

drop table #TempTable;

Alternatively you could use a CTE for example:

WITH UniqueRecords AS
(
    select MAX(ID) AS ID
    from #TempTable
    group by SomeData
)
DELETE A
FROM #TempTable A
    LEFT outer join UniqueRecords B on
        A.ID = B.ID
WHERE B.ID IS NULL
查看更多
Viruses.
5楼-- · 2020-07-17 06:40

You can alternatively use ROW_NUMBER() function to filter out duplicates

;WITH [CTE_DUPLICATES] AS 
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
) 
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1
查看更多
登录 后发表回答