I have duplicate rows in my table and I want to delete duplicates in the most efficient way since the table is big. After some research, I have come up with this query:
WITH TempEmp AS
(
SELECT name, ROW_NUMBER() OVER(PARTITION by name, address, zipcode ORDER BY name) AS duplicateRecCount
FROM mytable
)
-- Now Delete Duplicate Records
DELETE FROM TempEmp
WHERE duplicateRecCount > 1;
But it only works in SQL, not in Netezza. It would seem that it does not like the DELETE
after the WITH
clause?
We can use a window function for very effective removal of duplicate rows:
Some PostgreSQL's optimized version (with ctid):
If you have no other unique identifier, you can use
ctid
:It is a good idea to have a unique, auto-incrementing id in every table. Doing a
delete
like this is one important reason why.From the documentation delete duplicate rows
A frequent question in IRC is how to delete rows that are duplicates over a set of columns, keeping only the one with the lowest ID. This query does that for all rows of tablename having the same column1, column2, and column3.
Sometimes a timestamp field is used instead of an ID field.