How can I delete duplicate rows
where no unique row id
exists?
My table is
col1 col2 col3 col4 col5 col6 col7
john 1 1 1 1 1 1
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
sally 2 2 2 2 2 2
I want to be left with the following after the duplicate removal:
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
I've tried a few queries but i think they depend on a row id as I don't get desired result. For example:
DELETE FROM table WHERE col1 IN (
SELECT id FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)
After trying the suggested solution above, that works for small medium tables. I can suggest that solution for very large tables. since it runs in iterations.
LargeSourceTable
sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
LargeSourceTable
again, but now, add a primary key with all the columns that define the duplications addWITH (IGNORE_DUP_KEY = ON)
For example:
CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO
Create again the views that you dropped in the first place for the new created table
Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often.
Note, that I set the
IDENTITY_INSERT
on and off because one the columns contains auto incremental id, which I'm also copyingSET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000
select @TotalRows = count (*) from LargeSourceTable_TEMP
SET IDENTITY_INSERT LargeSourceTable OFF
I would prefer CTE for deleting duplicate rows from sql server table
strongly recommend to follow this article ::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/
Microsoft has a vey ry neat guide on how to remove duplicates. Check out http://support.microsoft.com/kb/139444
In brief, here is the easiest way to delete duplicates when you have just a few rows to delete:
myprimarykey is the identifier for the row.
I set rowcount to 1 because I only had two rows that were duplicated. If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1.
Hope it helps anyone
With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server
The idea of removing duplicate involves
Step-by-step
Without using
CTE
andROW_NUMBER()
you can just delete the records just by using group by withMAX
function here is and example