SQL Server Duplicate Records

2019-08-03 15:24发布

问题:

Hello I have done the following query below:

UPDATE [dbo].[TestData]
SET Duplicate = 'Duplicate within'
WHERE exists 
(SELECT telephone, COUNT(telephone)
FROM [dbo].[TestData]
GROUP BY telephone
HAVING (COUNT (telephone)>1))

In that table there are actually 9 duplicate telephone records.

The query is stamping the entire duplicate column as 'Duplicate within' instead of the 9 records.

The next following query I have also developed which will unstamp the 18 duplicate records to 9.

UPDATE [dbo].[TestData]
SET Duplicate = 'NO'
WHERE ID IN (SELECT MIN(ID) FROM [dbo].[TestData] GROUP BY telephone)

This query is not working neither could anyone please guide me on where I am going wrong!

回答1:

You could do this using where exists, but it's easier to write/read this way and the performance difference is most likely minimal.

update TestData set 
    Duplicate = 'Duplicate within'
where 
    Telephone in (
        select Telephone 
        from TestData 
        group by Telephone 
        having count(*) > 1
    )

To leave the first record with each telephone number alone and mark only the subsequent records with the same telephone number, use a cte as follows:

;with NumberedDupes as (
    select
        Telephone,
        Duplicate,
        row_number() over (partition by Telephone order by Telephone) seq
    from TestData
)
update NumberedDupes set Duplicate = 'Duplicate within' where seq > 1


回答2:

The issue is that the EXISTS query isn't filtered. It needs to be filtered by each phone #:

UPDATE [dbo].[TestData]
SET Duplicate = 'Duplicate within'
FROM [TestData] t
WHERE EXISTS (
    SELECT telephone, COUNT(telephone)
    FROM [dbo].[TestData]
    WHERE telephone = t.telephone
    GROUP BY telephone
    HAVING (COUNT (telephone)>1))
)


回答3:

If you just want to find the duplicates, you need to look at one of the two records as demonstrated by the sub select below. The EXISTS will actually make you update both rows since that is what you are testing.

    UPDATE [dbo].[TestData]
    SET Duplicate = 'Duplicate within'
    WHERE Id IN  
    (SELECT MAX(Id)
    FROM [dbo].[TestData]
    GROUP BY telephone
    HAVING (COUNT (telephone)>1))