I have a MySQL query (Ubu 10.04,Innodb, Core i7, 16Gb RAM, SSD drives, MySQL params optimized):
SELECT
COUNT(DISTINCT subscriberid)
FROM
em_link_data
WHERE
linkid in (SELECT l.id FROM em_link l WHERE l.campaignid = '2900' AND l.link != 'open')
The table em_link_data has about 7million rows, em_link has a few thousand.
This query will take about 18 seconds to complete. However, if I substitute the results
of the subquery and do this:
SELECT
COUNT(DISTINCT subscriberid)
FROM
em_link_data
WHERE
linkid in (24899,24900,24901,24902);
then the query will run in less than 1 millisecond. The subquery alone runs in less than 1ms, the column linkid is indexed.
If I rewrite the query as a join, also less than 1ms. Why is a "IN" query so slow with a subquery in it and why so fast with values in it? I can't rewrite the query (bought software) so I was hoping there is some tweak or hint to speedup this query! Any help is appreciated.
Subqueries execute every time you evaluate them (in MySQL anyway, not all RDBMSes), i.e. you're basically running 7 million queries! Using a JOIN, if possible, will reduce this to 1. Even if adding indexing improves performance of those, you're still running them.
Yes, IN
with subqueries is slow. Use a join instead.
SELECT
COUNT(DISTINCT subscriberid)
FROM em_link_data JOIN em_link ON em_link_data.linkid=em_link.id
WHERE em_link.campaignid = '2900' AND em_link.link != 'open'
And make sure you've defined indexes on em_link_data.linkid
and em_link.id
.
The problem is that MySQL executes queries from outside to inside, while you might think that your subquery is done once and then its results are passed to the WHERE expression of the outer query (see MySQL documentation).
If you can't rewrite your query, you should do the following optimizations:
- add an index on
campaignid
and link
as FrustratedWithFormsDesigner said
- check that the subquery uses indexes correctly by doing
EXPLAIN SELECT ...
- enable and tweak query cache, as that should speed up the subquery being called multiple times
One more idea would be to install MySQL proxy and write a little script that intercepts your query and rewrites it to use a join.
If your subquery is fast thus campaignid and link are absolutely indexed.
l.id is PK and clustered thus is fast.
But as far as I remember(from last time I checked this subject) , mysql describes about its internal optimizations for "in" subqueries to use the index sort of subquery result to improve performance and also uses cache for the left side of "IN" to drag it inside the subquery faster and if indexes are set true it must not have such difference to use inner join or "IN" rather than caching and it may be due to cache problem and massive data.
http://dev.mysql.com/doc/internals/en/transformation-scalar-in.html
I don't know the situation of the software but if you can use INNER JOIN and you have (probably) some additional definitions before the IN clause in the WHERE clause of your outer query make sure to move that clauses to before of your main INNER JOIN via a temporary INNER JOIN behaves similar to an intervenient "where" clause sequentially and reduces the number of cross comparisons in a JOIN like this :
SELECT ... FROM t
INNER JOIN (SELECT 1) AS tmp ON t.asd=23
INNER JOIN t2 ON ...
Sample comparisons of normal and temp join lookups : 1000 * 1000 > 1000 + (100 * 1000)
Also it seems the subquery is filtered by constant vals thus if it was me I was gonna put the clauses in a sub query generating the resultset and reduce the number of comparisons in a JOIN like this :
SELECT ... FROM t
INNER JOIN (SELECT ... FROM t2 WHERE constant clauses) AS tbl2 ON ...
Anyway, in the "IN" query, comparing any column of the table in the subquery to any column of the table in outer query requires the columns of both side to be precisely indexed (regard to composite indexes) but still it may be a cache problem.
EDITED :
Also I was curious to ask : Can the making a composite index on l.campaignid, l.link and l.id make any sense?