I need advice regarding subselect performance in MySQL. For a reason that I can't change, I am not able to use JOIN to create quesry filter, I can only add another AND clause in WHERE.
What is the peformance of:
select tasks.*
from tasks
where
some criteria
and task.project_id not in (select id from project where project.is_template = 1);
compared to:
select tasks.*
from tasks, project
where
some criteria
and task.project_id = project.id and project.is_template <> 1;
Note that there is relatively small number of projects whete is_template = 1, and there could be large number of projects where is_template <> 1.
Is there other way to achieve the same result without subselects if I can't change anything but and filter?
I believe that the second is more efficient as it requires only one select, but to be sure, you should EXPLAIN each query and check the results.
EXPLAIN select tasks.*
from tasks
where
some criteria
and task.project_id not in (select id from project where project.is_template = 1);
EXPLAIN select tasks.*
from tasks, project
where
some criteria
and task.project_id = project.id and project.is_template <> 1;
How much difference there is between the two could depend greatly on what "some criteria" is and what opportunities to use indexes it provides. But note that they are not equivalent in terms of results if there are tasks that don't have projects. The second is equivalent to this:
select tasks.*
from tasks
where
some criteria
and task.project_id in (select id from project where project.is_template <> 1);
I think the first may scale better:
When you do a join, internally mysql makes a sort of temporary table consisting of the two tables joined according to the join conditions specified. You aren't giving a join condition, so it'll create a temp table with all tasks listed against all projects. I'm fairly sure (but do check with the explain tool) that it does this prior to applying any where clauses.
Result: if there are 10 of each, it'll have 10 * 10 rows = 100. You can see how this gets big as numbers rise. It then applies the where to this temporary table.
By contrast, the subquery selects only the relevant rows from each table.
But unless scaling is a concern, I don't think it really matters.
Avoid sub queries like the plague in MySQL versions < 6.0, and I doubt you're using 6.0 considering it's still in the alpha phase of development. AFAIK, the MySQL optimizer does not handle subqueries well at all. Some major work has gone into revamping the optimizer for 6.0 and sub queries are working much better now, but these changes haven't trickled down into the 5.0 or 5.1 series.