How to delete duplicate rows without unique identi-第2页回答

I have duplicate rows in my table and I want to delete duplicates in the most efficient way since the table is big. After some research, I have come up with this query:

WITH TempEmp AS
(
SELECT name, ROW_NUMBER() OVER(PARTITION by name, address, zipcode ORDER BY name) AS duplicateRecCount
FROM mytable
)
-- Now Delete Duplicate Records
DELETE FROM TempEmp
WHERE duplicateRecCount > 1;

But it only works in SQL, not in Netezza. It would seem that it does not like the DELETE after the WITH clause?

标签： sql postgresql duplicates netezza

9条回答

Emotional °昔

2楼-- · 2019-01-08 12:12

We can use a window function for very effective removal of duplicate rows:

DELETE FROM tab 
  WHERE id IN (SELECT id 
                  FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), id 
                           FROM tab) x 
                 WHERE x.row_number > 1);

Some PostgreSQL's optimized version (with ctid):

DELETE FROM tab 
  WHERE ctid = ANY(ARRAY(SELECT ctid 
                  FROM (SELECT row_number() OVER (PARTITION BY column_with_duplicate_values), ctid 
                           FROM tab) x 
                 WHERE x.row_number > 1));

0人赞添加讨论(0) 举报

对你真心纯属浪费

3楼-- · 2019-01-08 12:20

If you have no other unique identifier, you can use ctid:

delete from mytable
    where exists (select 1
                  from mytable t2
                  where t2.name = mytable.name and
                        t2.address = mytable.address and
                        t2.zip = mytable.zip and
                        t2.ctid > mytable.ctid
                 );

It is a good idea to have a unique, auto-incrementing id in every table. Doing a delete like this is one important reason why.

0人赞添加讨论(0) 举报

爱情/是我丢掉的垃圾

4楼-- · 2019-01-08 12:20

From the documentation delete duplicate rows

A frequent question in IRC is how to delete rows that are duplicates over a set of columns, keeping only the one with the lowest ID. This query does that for all rows of tablename having the same column1, column2, and column3.

DELETE FROM tablename
WHERE id IN (SELECT id
          FROM (SELECT id,
                         ROW_NUMBER() OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
                 FROM tablename) t
          WHERE t.rnum > 1);

Sometimes a timestamp field is used instead of an ID field.

0人赞添加讨论(0) 举报

上一页 1 2

How to delete duplicate rows without unique identi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间