Removing duplicate rows from table in Oracle

2019-01-01 01:24发布

I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.

How can I delete all duplicate rows and leave only one of them?

21条回答
路过你的时光
2楼-- · 2019-01-01 02:04
delete from dept
where rowid in (
     select rowid
     from dept
     minus
     select max(rowid)
     from dept
     group by DEPTNO, DNAME, LOC
);
查看更多
余欢
3楼-- · 2019-01-01 02:06

From Ask Tom

delete from t
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         companyid, agentid, class , status, terminationdate
                                   order by rowid) rn
                            from t)
                   where rn <> 1);

(fixed the missing parenthesis)

查看更多
流年柔荑漫光年
4楼-- · 2019-01-01 02:10

You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:

declare
prev_var my_table.var1%TYPE;

begin

for t in (select var1 from my_table order by var 1) LOOP

-- if previous var equal current var, delete the row, else keep on going.
end loop;

end;
查看更多
忆尘夕之涩
5楼-- · 2019-01-01 02:10

5. solution

delete from emp where rowid in 
    (
      select  rid from
       (
         select rowid rid,rank() over (partition by emp_id order by rowid)rn from emp     
       )
     where rn > 1
    );
查看更多
不再属于我。
6楼-- · 2019-01-01 02:10

I didn't see any answers that use common table expressions and window functions. This is what I find easiest to work with.

DELETE FROM
 YourTable
WHERE
 ROWID IN
    (WITH Duplicates
          AS (SELECT
               ROWID RID, 
               ROW_NUMBER() 
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date)
                  AS RN
               SUM(1)
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date
               ORDER BY ROWID ROWS BETWEEN UNBOUNDED PRECEDING 
                                       AND UNBOUNDED FOLLOWING)
                   AS CNT
              FROM
               YourTable
              WHERE
               Load_Date IS NULL)
     SELECT
      RID
     FROM
      duplicates
     WHERE
      RN > 1);

Somethings to note:

1) We are only checking for duplication on the fields in the partition clause.

2) If you have some reason to pick one duplicate over others you can use an order by clause to make that row will have row_number() = 1

3) You can change the number duplicate preserved by changing the final where clause to "Where RN > N" with N >= 1 (I was thinking N = 0 would delete all rows that have duplicates, but it would just delete all rows).

4) Added the Sum partition field the CTE query which will tag each row with the number rows in the group. So to select rows with duplicates, including the first item use "WHERE cnt > 1".

查看更多
人间绝色
7楼-- · 2019-01-01 02:11

The Fastest way for really big tables

  1. Create exception table with structure below: exceptions_table

    ROW_ID ROWID
    OWNER VARCHAR2(30)
    TABLE_NAME VARCHAR2(30)
    CONSTRAINT VARCHAR2(30)
    
  2. Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.

    alter table add constraint
    unique --or primary key
    (dupfield1,dupfield2) exceptions into exceptions_table;
    
  3. Join your table with exceptions_table by rowid and delete dups

    delete original_dups where rowid in (select ROW_ID from exceptions_table);
    
  4. If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table

    create table new_table_with_no_dups AS (
        select field1, field2 ........ 
        from original_dups t1
        where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
    )
    
查看更多
登录 后发表回答