I've got a couple of duplicates in a database that I want to inspect, so what I did to see which are duplicates, I did this:
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
This way, I will get all rows with relevant_field occuring more than once. This query takes milliseconds to execute.
Now, I wanted to inspect each of the duplicates, so I thought I could SELECT each row in some_table with a relevant_field in the above query, so I did like this:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM some_table
GROUP BY relevant_field
HAVING COUNT(*) > 1
)
This turns out to be extreeeemely slow for some reason (it takes minutes). What exactly is going on here to make it that slow? relevant_field is indexed.
Eventually I tried creating a view "temp_view" from the first query (SELECT relevant_field FROM some_table GROUP BY relevant_field HAVING COUNT(*) > 1)
, and then making my second query like this instead:
SELECT *
FROM some_table
WHERE relevant_field IN
(
SELECT relevant_field
FROM temp_view
)
And that works just fine. MySQL does this in some milliseconds.
Any SQL experts here who can explain what's going on?
This is similar to my case, where I have a table named
tabel_buku_besar
. What I need areLooking for record that have
account_code='101.100'
intabel_buku_besar
which havecompanyarea='20000'
and also haveIDR
ascurrency
I need to get all record from
tabel_buku_besar
which have account_code same as step 1 but havetransaction_number
in step 1 resultwhile using
select ... from...where....transaction_number in (select transaction_number from ....)
, my query running extremely slow and sometimes causing request time out or make my application not responding...I try this combination and the result...not bad...
I have reformatted your slow sql query with www.prettysql.net
When using a table in both the query and the subquery, you should always alias both, like this:
Does that help?
Rewrite the query into this
I think
st2.relevant_field
must be in the select, because otherwise thehaving
clause will give an error, but I'm not 100% sureNever use
IN
with a subquery; this is notoriously slow.Only ever use
IN
with a fixed list of values.More tips
SELECT *
only select the fields that you really need.relevant_field
to speed up the equi-join.group by
on the primary key.General solution for 90% of your
IN (select
queriesUse this code
The subquery is being run for each row because it is a correlated query. One can make a correlated query into a non-correlated query by selecting everything from the subquery, like so:
The final query would look like this: