Hello I have done the following query below:
UPDATE [dbo].[TestData]
SET Duplicate = 'Duplicate within'
WHERE exists
(SELECT telephone, COUNT(telephone)
FROM [dbo].[TestData]
GROUP BY telephone
HAVING (COUNT (telephone)>1))
In that table there are actually 9 duplicate telephone records.
The query is stamping the entire duplicate column as 'Duplicate within' instead of the 9 records.
The next following query I have also developed which will unstamp the 18 duplicate records to 9.
UPDATE [dbo].[TestData]
SET Duplicate = 'NO'
WHERE ID IN (SELECT MIN(ID) FROM [dbo].[TestData] GROUP BY telephone)
This query is not working neither could anyone please guide me on where I am going wrong!
You could do this using where exists, but it's easier to write/read this way and the performance difference is most likely minimal.
update TestData set
Duplicate = 'Duplicate within'
where
Telephone in (
select Telephone
from TestData
group by Telephone
having count(*) > 1
)
To leave the first record with each telephone number alone and mark only the subsequent records with the same telephone number, use a cte as follows:
;with NumberedDupes as (
select
Telephone,
Duplicate,
row_number() over (partition by Telephone order by Telephone) seq
from TestData
)
update NumberedDupes set Duplicate = 'Duplicate within' where seq > 1
The issue is that the EXISTS
query isn't filtered. It needs to be filtered by each phone #:
UPDATE [dbo].[TestData]
SET Duplicate = 'Duplicate within'
FROM [TestData] t
WHERE EXISTS (
SELECT telephone, COUNT(telephone)
FROM [dbo].[TestData]
WHERE telephone = t.telephone
GROUP BY telephone
HAVING (COUNT (telephone)>1))
)
If you just want to find the duplicates, you need to look at one of the two records as demonstrated by the sub select below. The EXISTS will actually make you update both rows since that is what you are testing.
UPDATE [dbo].[TestData]
SET Duplicate = 'Duplicate within'
WHERE Id IN
(SELECT MAX(Id)
FROM [dbo].[TestData]
GROUP BY telephone
HAVING (COUNT (telephone)>1))