How to delete duplicates on a MySQL table?-第2页回答

2楼-- · 2018-12-31 05:31

delete from `table` where `table`.`SID` in 
    (
    select t.SID from table t join table t1 on t.title = t1.title  where t.SID > t1.SID
)

0人赞添加讨论(0) 举报

刘海飞了

3楼-- · 2018-12-31 05:32

This work for me to remove old records:

delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
);

You can replace min(e.id) to max(e.id) to remove newest records.

0人赞添加讨论(0) 举报

公子世无双

4楼-- · 2018-12-31 05:32

delete p from 
product p
inner join (
    select max(id) as id, url from product 
    group by url 
    having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;

0人赞添加讨论(0) 举报

唯独是你

5楼-- · 2018-12-31 05:32

This here will make the column column_name into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name.

ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);

0人赞添加讨论(0) 举报

谁念西风独自凉

6楼-- · 2018-12-31 05:35

I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.

1) add a new boolean column

alter table mytable add tokeep boolean;

2) add a constraint on the duplicated columns AND the new column

alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);

3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

update ignore mytable set tokeep = true;

4) delete rows that have not been marked as tokeep

delete from mytable where tokeep is null;

5) drop the added column

alter table mytable drop tokeep;

I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

0人赞添加讨论(0) 举报

泛滥B

7楼-- · 2018-12-31 05:37

Deleting duplicate rows in MySQL, walkthrough

Create the table and insert some rows:

dev-db> create table penguins(foo int, bar varchar(15), baz datetime);
Query OK, 0 rows affected (0.07 sec)
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(4, 'rico', now());
Query OK, 6 rows affected (0.07 sec)
dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:54 |
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:09 |
|    3 | kowalski | 2014-08-25 14:22:13 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
6 rows in set (0.00 sec)

Then remove the duplicates:

dev-db> delete a
    -> from penguins a
    -> left join(
    -> select max(baz) maxtimestamp, foo, bar
    -> from penguins
    -> group by foo, bar) b
    -> on a.baz = maxtimestamp and
    -> a.foo = b.foo and
    -> a.bar = b.bar
    -> where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)

Result:

dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)

What's that delete statement doing

Pseudocode: Group the rows by the two columns you want to remove duplicates of. Choose the one row of each group to keep by using the max aggregate. A left join returns all rows from the left table, with the matching rows in the right table. In this case the left table has all rows in the table, and the right only holds those rows that are NULL (not the one row per group you want to keep). Deleting those rows, you are left with only the unique one per group.

More technical explanation, How you should read that sql delete statement:

Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp grouped by foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.

Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.

Make a backup of the table before you run this.

Prevent this problem from ever happening again on this table:

If you got this to work, and it put out your "duplicate rows" fire. Great. Your work isn't done yet. Define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place. Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.

0人赞添加讨论(0) 举报

How to delete duplicates on a MySQL table?

Deleting duplicate rows in MySQL, walkthrough

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间