I made a mistake in a bulk insert script, so now i have "duplicate" rows with different colX. I need to delete this duplicate rows, but I cant figure out how. To be more precise, I have this:
col1 | col2 | col3 | colX
----+----------------------
0 | 1 | 2 | a
0 | 1 | 2 | b
0 | 1 | 2 | c
0 | 1 | 2 | a
3 | 4 | 5 | x
3 | 4 | 5 | y
3 | 4 | 5 | x
3 | 4 | 5 | z
and I want to keep the first occurrence of each (row, colX):
col1 | col2 | col3 | colX
----+----------------------
0 | 1 | 2 | a
3 | 4 | 5 | x
Thank you for your replies :)
I assume you're using
SQL Server 2005/2008.
Simplest solution could be as follows suppose we have table emp_dept(empid, deptid) which has duplicate rows, On Oracle database
On sql server or anydatabase which does not support row id kinda feature , we need to add identity column just to identify each row. say we have added nid as identity to the table
now query to delete duplicate could be written as
Here the concept is delete all rows for which there exists other rows which have similar core values but smaller rowid or identity. Hence if there exists duplicate rows then one which has higher row id or identity will get deleted. and for row there isn't duplicate it fail in finding lower row id hence will not get deleted.
I would suggest to use CTE and read all non-dup records in a separate table if you have many duplicates. However, there is a recommended post to follow: MSDN
Try this code bt on your own risk
Second method using row_number() this is safe method
Assuming colX is unique (which is not the case in your example, even though you said "different colX") you could use the following to delete the duplicates:
(Let's say your table is named "Duplicates")
If colX is not unique, add a new uniqueidentifier column, insert distinct values into it and then use the code above by joining on that column instead of colX.
Try the simplest approach with Sql Server's CTE: http://www.sqlfiddle.com/#!3/2d386/2
Data:
Solution:
Output:
Or perhaps this: http://www.sqlfiddle.com/#!3/af826/1
Data:
Solution:
Output: