I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?
相关问题
- sqlyog export query result as csv
- NOT DISTINCT query in mySQL
- MySQL: conduct a basic search
- Why sometimes there is one of more gap(s) in the v
- mySQL alter table on update, current timestamp
One very late contribution... in case it helps anyone waaaaaay down the line... I had a task to find matching pairs of transactions (actually both sides of account-to-account transfers) in a banking app, to identify which ones were the 'from' and 'to' for each inter-account-transfer transaction, so we ended up with this:
The result is that the
DuplicateResultsTable
provides rows containing matching (i.e. duplicate) transactions, but it also provides the same transaction id's in reverse the second time it matches the same pair, so the outerSELECT
is there to group by the first transaction ID, which is done by usingLEAST
andGREATEST
to make sure the two transactionid's are always in the same order in the results, which makes it safe toGROUP
by the first one, thus eliminating all the duplicate matches. Ran through nearly a million records and identified 12,000+ matches in just under 2 seconds. Of course the transactionid is the primary index, which really helped.This query returns complete records, not just distinct
varchar_column
's.This query doesn't use
COUNT(*)
. If there are lots of duplicates,COUNT(*)
is expensive, and you don't need the wholeCOUNT(*)
, you just need to know if there are two rows with same value.Having an index on
varchar_column
will, of course, speed up this query greatly.I saw the above result and query will work fine if you need to check single column value which are duplicate. For example email.
But if you need to check with more columns and would like to check the combination of the result so this query will work fine:
Do a
SELECT
with aGROUP BY
clause. Let's say name is the column you want to find duplicates in:This will return a result with the name value in the first column, and a count of how many times that value appears in the second.
My final query incorporated a few of the answers here that helped - combining group by, count & GROUP_CONCAT.
This provides the id of both examples (comma separated), the barcode I needed, and how many duplicates.
Change table and columns accordingly.