Deleting duplicates from a large table

I have quite a large table with 19 000 000 records, and I have problem with duplicate rows. There's a lot of similar questions even here in SO, but none of them seems to give me a satisfactory answer. Some points to consider:

Row uniqueness is determined by two columns, location_id and datetime.
I'd like to keep the execution time as fast as possible (< 1 hour).
Copying tables is not very feasible as the table is several gigabytes in size.
No need to worry about relations.

As said, every location_id can have only one distinct datetime, and I would like to remove all the duplicate instances. It does not matter which one of them survives, as the data is identical.

Any ideas?

标签： mysql unique duplicates

5条回答

你好瞎i

2楼-- · 2019-02-08 13:34

I think you can use this query to delete the duplicate records from the table

ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)

Before doing this, just test with some sample data first..and then Try this....

Note: On version 5.5, it works on MyISAM but not InnoDB.

0人赞添加讨论(0) 举报

Explosion°爆炸

3楼-- · 2019-02-08 13:44

UPDATE table SET datetime  = null 
WHERE location_id IN (
SELECT location_id 
FROM table as tableBis
WHERE tableBis.location_id = table.location_id
AND table.datetime > tableBis.datetime)

SELECT * INTO tableCopyWithNoDuplicate FROM table WHERE datetime is not null

DROp TABLE table 

RENAME tableCopyWithNoDuplicate to table

So you keep the line with the lower datetime. I'm not sure about perf, it depends on your table column, your server etc...

0人赞添加讨论(0) 举报

家丑人穷心不美

4楼-- · 2019-02-08 13:51

SELECT *, COUNT(*) AS Count
FROM table
GROUP BY location_id, datetime
HAVING Count > 2

0人赞添加讨论(0) 举报

Bombasti

5楼-- · 2019-02-08 13:58

This query works perfectly for every case : tested for Engine : MyIsam for 2 million rows.

ALTER IGNORE TABLE table_name ADD UNIQUE (location_id, datetime)

0人赞添加讨论(0) 举报

在下西门庆

6楼-- · 2019-02-08 13:59

You can delete duplicates using these steps: 1- Export the following query's results into a txt file:

select dup_col from table1 group by dup_col having count(dup_col) > 1

2- Add this to the first of above txt file and run the final query:

delete from table1 where dup_col in (.....)

Please note that '...' is the contents of txt file created in the first step.

0人赞添加讨论(0) 举报

Deleting duplicates from a large table

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间