I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates. However I somehow simply made a normal index on those. So duplicates got inserted. It is 20 million record table.
If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it.
if you think there will be duplicates, adding the unique index will fail. first check what duplicates there are:
This may be a expensive query on 20M rows, but will get you all duplicate keys that will prevent you from adding the primary index. You could split this up into smaller chunks if you do a where in the subquery:
where a='some_value'
For the records retrieved, you will have to change something to make the rows unique. If that is done (query returns 0 rows) you should be safe to add the primary index.
Instead of IGNORE you can use ON DUPLICATE KEY UPDATE, which will give you control over which values should prevail.
To answer your question- adding a
UNIQUE
constraint on a column that has duplicate values will throw an error.For example, you can try the following script:
If you have duplicates in your table and you use
the query will fail with Error 1062 (duplicate key).
But if you use
IGNORE
the duplicates will be removed. But the documentation doesn't specify which row will be kept:
(ALTER TABLE Syntax)
If your version is 5.7.4 or greater - you can:
INSERT IGNORE
(which is still available).(INSERT Syntax)
Also see: INSERT ... SELECT Syntax and Comparison of the IGNORE Keyword and Strict SQL Mode