How to delete duplicates on a MySQL table?

2018-12-31 04:58发布

I need to DELETE duplicated rows for specified sid on a MySQL table.

How can I do this with an SQL query?

DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"

Something like this, but I don't know how to do it.

22条回答
闭嘴吧你
2楼-- · 2018-12-31 05:31
delete from `table` where `table`.`SID` in 
    (
    select t.SID from table t join table t1 on t.title = t1.title  where t.SID > t1.SID
)
查看更多
刘海飞了
3楼-- · 2018-12-31 05:32

This work for me to remove old records:

delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
); 

You can replace min(e.id) to max(e.id) to remove newest records.

查看更多
公子世无双
4楼-- · 2018-12-31 05:32
delete p from 
product p
inner join (
    select max(id) as id, url from product 
    group by url 
    having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;
查看更多
唯独是你
5楼-- · 2018-12-31 05:32

This here will make the column column_name into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name.

ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);
查看更多
谁念西风独自凉
6楼-- · 2018-12-31 05:35

I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.

As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.

1) add a new boolean column

alter table mytable add tokeep boolean;

2) add a constraint on the duplicated columns AND the new column

alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);

3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint

update ignore mytable set tokeep = true;

4) delete rows that have not been marked as tokeep

delete from mytable where tokeep is null;

5) drop the added column

alter table mytable drop tokeep;

I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.

查看更多
泛滥B
7楼-- · 2018-12-31 05:37

Deleting duplicate rows in MySQL, walkthrough

Create the table and insert some rows:

dev-db> create table penguins(foo int, bar varchar(15), baz datetime);
Query OK, 0 rows affected (0.07 sec)
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(4, 'rico', now());
Query OK, 6 rows affected (0.07 sec)
dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:54 |
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:09 |
|    3 | kowalski | 2014-08-25 14:22:13 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
6 rows in set (0.00 sec)

Then remove the duplicates:

dev-db> delete a
    -> from penguins a
    -> left join(
    -> select max(baz) maxtimestamp, foo, bar
    -> from penguins
    -> group by foo, bar) b
    -> on a.baz = maxtimestamp and
    -> a.foo = b.foo and
    -> a.bar = b.bar
    -> where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)

Result:

dev-db> select * from penguins;
+------+----------+---------------------+
| foo  | bar      | baz                 |
+------+----------+---------------------+
|    1 | skipper  | 2014-08-25 14:21:59 |
|    3 | kowalski | 2014-08-25 14:22:15 |
|    4 | rico     | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)

What's that delete statement doing

Pseudocode: Group the rows by the two columns you want to remove duplicates of. Choose the one row of each group to keep by using the max aggregate. A left join returns all rows from the left table, with the matching rows in the right table. In this case the left table has all rows in the table, and the right only holds those rows that are NULL (not the one row per group you want to keep). Deleting those rows, you are left with only the unique one per group.

More technical explanation, How you should read that sql delete statement:

Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp grouped by foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.

Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.

Make a backup of the table before you run this.

Prevent this problem from ever happening again on this table:

If you got this to work, and it put out your "duplicate rows" fire. Great. Your work isn't done yet. Define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place. Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.

查看更多
登录 后发表回答