I am in the midst of uploading and updating my db from data from a third party source. Unfortunately, there are many duplicate records in the data from the third party data source.
I looked at a few questions here on SO but all of them seem to be cases where there is an ID column which differentiates one row from the other.
In my case, there is no ID column. e.g.
State City SubDiv Pincode Locality Lat Long
Orissa Koraput Jeypore 764001 B.D.Pur 18.7743 82.5693
Orissa Koraput Jeypore 764001 Jeypore 18.7743 82.5693
Orissa Koraput Jeypore 764001 Jeypore 18.7743 82.5693
Orissa Koraput Jeypore 764001 Jeypore 18.7743 82.5693
Orissa Koraput Jeypore 764001 Jeypore 18.7743 82.5693
Is there a simple query which I can run to delete all duplicate records and keep one record as the original? So in the above case I want to delete rows 3,4,5 from the table.
I am not sure if this can be done using simple sql statements but would like to know others opinion how this can be done
Try this
You may use the ROW_NUMBER() function : SQL SERVER – 2005 – 2008 – Delete Duplicate Rows
I would insert the third party data to a temporary table that then:
and finally delete the temporary table.
Only distinct (unique) rows will be inserted to the target table.
One of
SELECT DISTINCT * INTO ANewTable FROM OldTable
and then rename etcAnd then add a unique index on the desired columns