Here's a simplified example of my problem. I have a table where there's a "Name" column with duplicate entries:
ID Name
--- ----
1 AAA
2 AAA
3 AAA
4 BBB
5 CCC
6 CCC
7 DDD
8 DDD
9 DDD
10 DDD
Doing a GROUP BY like SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name
results in this:
Name Count
---- -----
AAA 3
BBB 1
CCC 2
DDD 4
I'm only concerned about the duplicates, so I'll add a HAVING clause, SELECT Name, COUNT(*) AS [Count] FROM Table GROUP BY Name HAVING COUNT(*) > 1
:
Name Count
---- -----
AAA 3
CCC 2
DDD 4
Trivial so far, but now things get tricky: I need a query to get me all the duplicate records, but with a nice incrementing indicator added to the Name column. The result should look something like this:
ID Name
--- --------
1 AAA
2 AAA (2)
3 AAA (3)
5 CCC
6 CCC (2)
7 DDD
8 DDD (2)
9 DDD (3)
10 DDD (4)
Note row 4 with "BBB" is excluded, and the first duplicate keeps the original Name.
Using an EXISTS
statement gives me all the records I need, but how do I go about creating the new Name value?
SELECT * FROM Table AS T1
WHERE EXISTS (
SELECT Name, COUNT(*) AS [Count]
FROM Table
GROUP BY Name
HAVING (COUNT(*) > 1) AND (Name = T1.Name))
ORDER BY Name
I need to create an UPDATE statement that will fix all the duplicates, i.e. change the Name as per this pattern.
Update:
Figured it out now. It was the PARTITION BY clause I was missing.
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Select D.Id
, D.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End As Name
From Dups As D
If you want an update statement you can use pretty much the same structure:
With Dups As
(
Select Id, Name
, Row_Number() Over ( Partition By Name Order By Id ) As Rnk
From Table
)
Update Table
Set Name = T.Name + Case
When D.Rnk > 1 Then ' (' + Cast(D.Rnk As varchar(10)) + ')'
Else ''
End
From Table As T
Join Dups As D
On D.Id = T.Id
Just update the subquery directly:
update d
set Name = Name+'('+cast(r as varchar(10))+')'
from ( select Name,
row_number() over (partition by Name order by Name) as r
from [table]
) d
where r > 1
SELECT ROW_NUMBER() OVER(ORDER BY Name) AS RowNum,
Name,
Name + '(' + ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name) + ')' concatenatedName
FROM Table
WHERE Name IN
(
SELECT Name
FROM Table
GROUP BY Name
HAVING COUNT(*) > 1
)
This will get you what you originally asked for. For the update statement, you'll want to do a while and update the top 1
DECLARE @Pointer VARCHAR(20), @Count INT
WHILE EXISTS(SELECT Name FROM Table GROUP BY Name HAVING COUNT(1) > 1)
BEGIN
SELECT TOP 1 @Pointer = Name, @Count = COUNT(1) FROM Table GROUP BY Name HAVING COUNT(1) > 1
UPDATE TOP (1) TABLE
SET Name = Name + '(' + @Count + ')'
WHERE Name = @Pointer
END
There's no need to do an UPDATE
at all. The following will create the table for INSERT
as desired
SELECT
ROW_NUMBER() OVER(ORDER BY tb2.Id) Id,
tb2.Name + CASE WHEN COUNT(*) > 1 THEN ' (' + CONVERT(VARCHAR, Count(*)) + ')' ELSE '' END [Name]
FROM
tb tb1,
tb tb2
WHERE
tb1.Name = tb2.Name AND
tb1.Id <= tb2.Id
GROUP BY
tb2.Name,
tb2.Id
Here's an even simpler UPDATE statement:
UPDATE
tb
SET
[Name] = [Name] + ' (' + CONVERT(VARCHAR, ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id)) + ')'
WHERE
ROW_NUMBER () OVER (PARTITION BY [Name] ORDER BY Id) > 1