How to delete duplicate rows in sql server?

2018-12-31 09:48发布

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but i think they depend on a row id as I don't get desired result. For example:

DELETE FROM table WHERE col1 IN (
    SELECT id FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

15条回答
孤独寂梦人
2楼-- · 2018-12-31 10:16
DELETE from search
where id not in (
   select min(id) from search
   group by url
   having count(*)=1

   union

   SELECT min(id) FROM search
   group by url
   having count(*) > 1
)
查看更多
不流泪的眼
3楼-- · 2018-12-31 10:16

Please see the below way of deletion too.

Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values 
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)

Created a sample table named @table and loaded it with given data.

enter image description here

Delete  aliasName from (
Select  *,
        ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From    @table) aliasName 
Where   rowNumber > 1

Select * from @table

enter image description here

Note: If you are giving all columns in the Partition by part, then order by do not have much significance.

I know, the question is asked three years ago, and my answer is another version of what Tim has posted, But posting just incase it is helpful for anyone.

查看更多
笑指拈花
4楼-- · 2018-12-31 10:17

Oh wow, i feel so stupid by ready all this answers, they are like experts' answer with all CTE and temp table and etc.

And all I did to get it working was simply aggregated the ID column by using MAX.

DELETE FROM table WHERE col1 IN (
    SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

NOTE: you might need to run it multiple time to remove duplicate as this will only delete one set of duplicate rows at a time.

查看更多
深知你不懂我心
5楼-- · 2018-12-31 10:20

If you can find number of duplicate rows, for instance you have n duplicate row, then use this command

SET rowcount n-1
DELETE FROM your_table
WHERE (spacial condition)

for more info I suggest this

查看更多
若你有天会懂
6楼-- · 2018-12-31 10:27

Another way of removing dublicated rows without loosing information in one step is like following:

delete from dublicated_table t1 (nolock)
join (
    select t2.dublicated_field
    , min(len(t2.field_kept)) as min_field_kept
    from dublicated_table t2 (nolock)
    group by t2.dublicated_field having COUNT(*)>1
) t3 
on t1.dublicated_field=t3.dublicated_field 
    and len(t1.field_kept)=t3.min_field_kept
查看更多
姐姐魅力值爆表
7楼-- · 2018-12-31 10:28

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE:

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO (result is different; I assume that it's due to a typo on your part)

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

This example determines duplicates by a single column col1 because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
查看更多
登录 后发表回答