MySQL - SELECT WHERE field IN (subquery) - Extreme

2019-01-05 08:30发布

I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:

SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1

This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.

Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:

SELECT *
FROM some_table 
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)

This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.

Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1), and then making my second query like this instead:

SELECT *
FROM some_table
WHERE relevant_field IN
(
    SELECT relevant_field
    FROM temp_view
)

And that works just fine. MySQL does this in some milliseconds.

Any SQL experts here who can explain what's going on?

10条回答
等我变得足够好
2楼-- · 2019-01-05 08:35

I find this to be the most efficient for finding if a value exists, logic can easily be inverted to find if a value doesn't exist (ie IS NULL);

SELECT * FROM primary_table st1
LEFT JOIN comparision_table st2 ON (st1.relevant_field = st2.relevant_field)
WHERE st2.primaryKey IS NOT NULL

*Replace relevant_field with the name of the value that you want to check exists in your table

*Replace primaryKey with the name of the primary key column on the comparison table.

查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-01-05 08:38
SELECT st1.*
FROM some_table st1
inner join 
(
    SELECT relevant_field
    FROM some_table
    GROUP BY relevant_field
    HAVING COUNT(*) > 1
)st2 on st2.relevant_field = st1.relevant_field;

I've tried your query on one of my databases, and also tried it rewritten as a join to a sub-query.

This worked a lot faster, try it!

查看更多
放我归山
5楼-- · 2019-01-05 08:46

sometimes when data grow bigger mysql WHERE IN's could be pretty slow because of query optimization. Try using STRAIGHT_JOIN to tell mysql to execute query as is, e.g.

SELECT STRAIGHT_JOIN table.field FROM table WHERE table.id IN (...)

but beware: in most cases mysql optimizer works pretty well, so I would recommend to use it only when you have this kind of problem

查看更多
Melony?
6楼-- · 2019-01-05 08:48

Firstly you can find duplicate rows and find count of rows is used how many times and order it by number like this;

SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
		CASE q.NID
		WHEN @curCode THEN
			@curRow := @curRow + 1
		ELSE
			@curRow := 1
		AND @curCode := q.NID
		END
	) AS No
FROM UserInfo q,
(
		SELECT
			@curRow := 1,
			@curCode := ''
	) rt
WHERE q.NID IN
(
    SELECT NID
    FROM UserInfo
    GROUP BY NID
    HAVING COUNT(*) > 1
) 

after that create a table and insert result to it.

create table CopyTable 
SELECT q.id,q.name,q.password,q.NID,(select count(*) from UserInfo k where k.NID= q.NID) as Count,
(
		CASE q.NID
		WHEN @curCode THEN
			@curRow := @curRow + 1
		ELSE
			@curRow := 1
		AND @curCode := q.NID
		END
	) AS No
FROM UserInfo q,
(
		SELECT
			@curRow := 1,
			@curCode := ''
	) rt
WHERE q.NID IN
(
    SELECT NID
    FROM UserInfo
    GROUP BY NID
    HAVING COUNT(*) > 1
) 

Finally, delete dublicate rows.No is start 0. Except fist number of each group delete all dublicate rows.

delete from  CopyTable where No!= 0;

查看更多
祖国的老花朵
7楼-- · 2019-01-05 08:52

Try this

SELECT t1.*
FROM 
 some_table t1,
  (SELECT relevant_field
  FROM some_table
  GROUP BY relevant_field
  HAVING COUNT (*) > 1) t2
WHERE
 t1.relevant_field = t2.relevant_field;
查看更多
登录 后发表回答