Delete duplicated records from a table without pk

2019-02-24 11:18发布

问题:

I need to delete all the duplicated records from one of my tables the problem is that there isn't any id or unique or key column so I can't make something like this:

delete from tbl using tbl,tbl t2 where tbl.locationID=t2.locationID
  and tbl.linkID=t2.linkID  and tbl.ID>t2.ID

because it needs an id column or unique or key column and I can't make an

ALTER IGNORE TABLE 'mytable' ADD UNIQUE INDEX 

because there is information that will be always necessary duplicated but others don't and I can't make this:

DELETE FROM 'table' WHERE 'field' IN (SELECT 'field' FROM 'table' GROUP BY 'field'HAVING (COUNT('field')>1))

because it will delete all the duplicated and never will leave one this is an example of my table


+----------+----------------------+-------------+-------------+
| phone    | address              | name        | cellphone   |
+----------+----------------------+-------------+-------------+
| 2555555  | 1020 PANORAMA        | JUAN CARLOS | 0999999999  | diferent address
| 2555555  | GABRIEL JOSE 1020    | JUAN CARLOS | 0999999999  | good one
| 2555555  | GABRIEL JOSE 1020    | JUAN CARLOS | 0999999999  | duplicated
| 2555555  | C ATARAZANA 1020     | SILVIA      | 0777777777  | another good one
| 2555555  | C ATARAZANA 1020     | SILVIA      | 0777777777  | another duplicated
| 2555555  | GABRIEL JOSE 1020    | VIOLETA     | 0888888888  | diferent person
+----------+----------------------+-------------+-------------+

and this is what I want to leave


+----------+----------------------+--------------+-------------+
| phone    | address              | name         | cellphone   |
+----------+----------------------+--------------+-------------+
| 2555555  | 1020 PANORAMA        | JUAN CARLOS  | 0999999999  |
| 2555555  | GABRIEL JOSE 1020    | JUAN CARLOS  | 0999999999  |
| 2555555  | C ATARAZANA 1020     | SILVIA       | 0777777777  |
| 2555555  | GABRIEL JOSE 1020    | VIOLETA      | 0888888888  |
+----------+----------------------+--------------+-------------+

and I can't truncate or delete the original table because its used 24/7 and has 10000000 records....

Please help me.

回答1:

Adding a unique index (with all the columns of the table) with ALTER IGNORE will get rid of the duplicates:

ALTER IGNORE TABLE table_name
  ADD UNIQUE INDEX all_columns_uq
    (phone, address, name, cellphone) ;

Tested in SQL-Fiddle.

Note: In version 5.5 (due to a bug in the implementation of fast index creation), the above will work only if you provide this setting before the ALTER:

SET SESSION old_alter_table=1 ;


回答2:

its pretty simple just make a temporary table and drop the other table then recreate it

CREATE TEMPORARY TABLE IF NOT EXISTS no_dupes AS 
(SELECT * FROM test GROUP BY phone, address, name, cellphone);

TRUNCATE table test;
INSERT INTO test (phone, address, name, cellphone) 
SELECT phone, address, name, cell FROM no_dupes;

WORKING DEMO



回答3:

I'd use sub query. Something like:

DELETE FROM table1
WHERE EXISTS (
SELECT field1 
FROM table1 AS subTable1 
WHERE table1.field1 = subTable1.field1 and table1.field2 = subTable1.field2)

Haven't try this out though.



回答4:

there is always a PK per table but you can combine columns as an unique id, so it's possible use a full row as a unique id if you want to... but I don't recommend use a full row, you should search what are the most significant columns that you can use a PK, when you have done that, you can copy the data, if there is no problem the mysql won't copy the duplicate rows.

sorry for my bad english