Rename duplicate rows

2019-03-29 07:35发布

问题:

Here's a simplified example of my problem. I have a table where there's a "Name" column with duplicate entries:

ID    Name
---   ----
 1    AAA
 2    AAA
 3    AAA
 4    BBB
 5    CCC
 6    CCC
 7    DDD
 8    DDD
 9    DDD
10    DDD

Doing a GROUP BY like SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name results in this:

Name  Count
----  -----
AAA   3
BBB   1
CCC   2
DDD   4

I'm only concerned about the duplicates, so I'll add a HAVING clause, SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name HAVING COUNT(*) > 1:

Name  Count
----  -----
AAA   3
CCC   2
DDD   4

Trivial so far, but now things get tricky: I need a query to get me all the duplicate records, but with a nice incrementing indicator added to the Name column. The result should look something like this:

ID    Name
---   --------
 1    AAA
 2    AAA (2)
 3    AAA (3)
 5    CCC 
 6    CCC (2)
 7    DDD 
 8    DDD (2)
 9    DDD (3)
10    DDD (4)

Note row 4 with "BBB" is excluded, and the first duplicate keeps the original Name.

Using an EXISTS statement gives me all the records I need, but how do I go about creating the new Name value?

SELECT * FROM Table AS T1 
WHERE EXISTS (
    SELECT Name, COUNT(*) AS [Count] 
    FROM Table 
    GROUP BY Name 
    HAVING (COUNT(*) > 1) AND (Name = T1.Name))
ORDER BY Name

I need to create an UPDATE statement that will fix all the duplicates, i.e. change the Name as per this pattern.

Update: Figured it out now. It was the PARTITION BY clause I was missing.

回答1:

With Dups As
    (
    Select Id, Name
        , Row_Number() Over ( Partition By Name Order By Id ) As Rnk
    From Table
    )
Select D.Id
    , D.Name + Case
                When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
                Else ''
                End As Name
From Dups As D

If you want an update statement you can use pretty much the same structure:

With Dups As
    (
    Select Id, Name
        , Row_Number() Over ( Partition By Name Order By Id ) As Rnk
    From Table
    )
Update Table
Set Name = T.Name + Case
                    When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
                    Else ''
                    End
From Table As T
    Join Dups As D
        On D.Id = T.Id


回答2:

Just update the subquery directly:

update d
set Name = Name+'('+cast(r as varchar(10))+')'
from    (   select  Name, 
                    row_number() over (partition by Name order by Name) as r
            from    [table]
        ) d
where r > 1


回答3:

SELECT ROW_NUMBER() OVER(ORDER BY Name) AS RowNum,
       Name,
       Name + '(' + ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) + ')' concatenatedName
FROM Table 
WHERE Name IN 
(
     SELECT Name 
     FROM Table 
     GROUP BY Name 
     HAVING COUNT(*) > 1
)

This will get you what you originally asked for. For the update statement, you'll want to do a while and update the top 1

DECLARE @Pointer VARCHAR(20), @Count INT

WHILE EXISTS(SELECT Name FROM Table GROUP BY Name HAVING COUNT(1) > 1)
BEGIN
    SELECT TOP 1 @Pointer = Name, @Count = COUNT(1) FROM Table GROUP BY Name HAVING COUNT(1) > 1
    UPDATE TOP (1) TABLE
    SET Name = Name + '(' + @Count + ')'
    WHERE Name = @Pointer
END


回答4:

There's no need to do an UPDATE at all. The following will create the table for INSERT as desired

SELECT
    ROW_NUMBER() OVER(ORDER BY tb2.Id) Id,
    tb2.Name + CASE WHEN COUNT(*) > 1 THEN ' (' + CONVERT(VARCHAR, Count(*)) + ')' ELSE '' END [Name]
FROM
    tb tb1,
    tb tb2
WHERE
    tb1.Name = tb2.Name AND
    tb1.Id <= tb2.Id
GROUP BY
    tb2.Name,
    tb2.Id


回答5:

Here's an even simpler UPDATE statement:

UPDATE
    tb
SET
    [Name] = [Name] + ' (' + CONVERT(VARCHAR, ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id)) + ')'
WHERE
    ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id) > 1