I need to DELETE
duplicated rows for specified sid on a MySQL
table.
How can I do this with an SQL query?
DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"
Something like this, but I don't know how to do it.
I need to DELETE
duplicated rows for specified sid on a MySQL
table.
How can I do this with an SQL query?
DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"
Something like this, but I don't know how to do it.
This work for me to remove old records:
You can replace min(e.id) to max(e.id) to remove newest records.
This here will make the column
column_name
into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value forcolumn_name
.I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.
As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.
1) add a new boolean column
2) add a constraint on the duplicated columns AND the new column
3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint
4) delete rows that have not been marked as tokeep
5) drop the added column
I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.
Deleting duplicate rows in MySQL, walkthrough
Create the table and insert some rows:
Then remove the duplicates:
Result:
What's that delete statement doing
Pseudocode: Group the rows by the two columns you want to remove duplicates of. Choose the one row of each group to keep by using the max aggregate. A left join returns all rows from the left table, with the matching rows in the right table. In this case the left table has all rows in the table, and the right only holds those rows that are NULL (not the one row per group you want to keep). Deleting those rows, you are left with only the unique one per group.
More technical explanation, How you should read that sql delete statement:
Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp grouped by foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.
Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.
Make a backup of the table before you run this.
Prevent this problem from ever happening again on this table:
If you got this to work, and it put out your "duplicate rows" fire. Great. Your work isn't done yet. Define a new composite unique key on your table (on those two columns) to prevent more duplicates from being added in the first place. Like a good immune system, the bad rows shouldn't even be allowed in to the table at the time of insert. Later on all those programs adding duplicates will broadcast their protest, and when you fix them, this issue never comes up again.