I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.
How can I delete all duplicate rows and leave only one of them?
From Ask Tom
(fixed the missing parenthesis)
You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:
5. solution
I didn't see any answers that use common table expressions and window functions. This is what I find easiest to work with.
Somethings to note:
1) We are only checking for duplication on the fields in the partition clause.
2) If you have some reason to pick one duplicate over others you can use an order by clause to make that row will have row_number() = 1
3) You can change the number duplicate preserved by changing the final where clause to "Where RN > N" with N >= 1 (I was thinking N = 0 would delete all rows that have duplicates, but it would just delete all rows).
4) Added the Sum partition field the CTE query which will tag each row with the number rows in the group. So to select rows with duplicates, including the first item use "WHERE cnt > 1".
The Fastest way for really big tables
Create exception table with structure below: exceptions_table
Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.
Join your table with exceptions_table by rowid and delete dups
If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table