Speed up MySQL join to check for duplicates

2019-08-02 21:17发布

问题:

I'm using the following query to return all duplicate records with the same first and last name. The trick is that the contact_id, has to be in descending order.

The query returns the contacts as expected, but it is just SO SLOW! Takes about 6-8 seconds when checking around 30,000 records.

I have the contact_firstName, contact_lastName, contact_client_id, and contact_id all indexed in the database.

Any ideas what I could do to try and speed this up a bit? Thanks for your help :)

SELECT z.contact_id, z.contact_firstName, z.contact_lastName, RIGHT(z.contact_lastName,1) AS nameNum
FROM (`contacts` x) 
JOIN `contacts` z ON `x`.`contact_firstName` = `z`.`contact_firstName` 
AND x.contact_lastName = z.contact_lastName 
AND x.contact_client_id = ".$ID." 
AND z.contact_client_id = ".$ID." 
WHERE `x`.`contact_id` < `z`.`contact_id` 
GROUP BY `z`.`contact_id` 

回答1:

Not making any promises, but here's an alternative to try:

SELECT c.contact_id, c.contact_firstName, c.contact_lastName, RIGHT(c.contact_lastName,1) AS nameNum
    FROM (SELECT contact_firstName, contact_lastName, MIN(contact_id) AS MinID
              FROM contacts
              WHERE contact_client_id = ".$ID."
              GROUP BY contact_firstName, contact_lastName
              HAVING COUNT(*) > 1) t
        INNER JOIN contacts c
            ON t.contact_firstName = c.contact_firstName
                AND t.contact_lastName = c.contact_lastName
                AND c.contact_client_id = ".$ID."
                AND t.MinID <> c.contact_id


回答2:

SELECT z.contact_id, z.contact_firstName, z.contact_lastName
, RIGHT(z.contact_lastName,1) AS nameNum
FROM `contacts` x
JOIN `contacts` z ON (x.contact_client_id = z.contact_client_id)
WHERE `x`.`contact_id` < `z`.`contact_id` 
And x.contact_client_id = '$id'
GROUP BY `z`.`contact_id` 

Make sure you have an index on:
- contact_id.
- contact_client_id