What query will be the fastest?

I have three ways to construct my query:

The first one:

select obj from table1 where condition1 and obj in (
select obj from table2 where condition2 and obj in (
select obj from table3 where condition3 and obj in (
...
)))

The second one:

select obj from table1 where condition1
and obj in (select obj from table2 where condition2)
and obj in (select obj from table3 where condition3)
...

The third one:

select table1.obj from table1
inner join table2 on table2.obj = table1.obj and table2.condition='condition2'
inner join table3 on table3.obj = table2.obj and table3.condition='condition3'
...
where table1.condition='condition1'

My questions is if these queries provide the same result and if these query are equally optimal.

I am pretty sure that the first two queries produce the same output but the second query is faster. Am not sure about the third query.

ADDED

There is also another option:

select table1.obj from table1
inner join table2 on table2.obj = table1.obj
inner join table3 on table3.obj = table2.obj
...
where
table1.condition='condition1' and 
table2.condition='condition2' and 
table3.condition='condition3'

标签： mysql optimization join subquery

4条回答

冷血范

2楼-- · 2019-05-31 18:28

Well, first, those queries don't necessarily return the same results :

1. SELECT x FROM table1 WHERE x IN (SELECT x FROM table2 WHERE y = foo)

2. SELECT x FROM table1 JOIN table 2 USING x WHERE table2.y = foo

The IN() removes duplicates. So, if table2 contains 50 values of x satisfying condition y=foo, query 2 will return 50x more rows than query 1. That may be what you want, or not. If x is UNIQUE in both tables, the queries will give the same results.

subqueries (Options 1 and 2) are executed for every row of the outer query

This is of course wrong, since the subqueries are not dependent subqueries. Anyway, it will execute the subquery once, and hash it or sort it to eliminate duplicates (as required by IN()) then use the resulting list to perform an IN-join. MySQL until recent versions did re-execute the subquery for each row, this is no longer the case. Since MySQL doesn't do hash IN joins though, it might still be much slower than a JOIN.

0人赞添加讨论(0) 举报

可以哭但决不认输i

3楼-- · 2019-05-31 18:32

Although there are always exceptions, Option 3 is almost surely the best/first choice. Depending on your indexes and data distributions, the MySQL query execution planner will handle which order to pull from the tables.

In the other cases, subqueries (Options 1 and 2) are executed for every row of the outer query -- they can be terribly inefficient. So, following the previous statement, nested subqueries (Option 1), can be exponentially worse than using first order subqueries (Option 2) or normal joins (Option 3).

Note that for INNER JOINs, it doesn't matter with respect to performance or functionality if the extra conditions are in the JOIN clauses or in the WHERE clauses. Consequently, your other option is effectively equivalent to the Option 3.

0人赞添加讨论(0) 举报

走好不送

4楼-- · 2019-05-31 18:40

I would suspect the third query would be fastest. SQL is optimized to make JOINs a fast operation.

But the only way to know with YOUR data is to try them and see.

0人赞添加讨论(0) 举报

Anthone

5楼-- · 2019-05-31 18:48

Basicaly the way of execute a query is the following : FROM (get the whole data) => WHERE (apply constraints) =>SELECT (display results)

JOIN clause are always the best choice for joining data because in the WHERE clause, only data that matches the JOIN clauses are tested.

In the FROM, only field on the ON clause are selected and tested before getting the whole data for the WHERE and the SELECT.

In your both first examples, for each SELECT the whole content of tables is selected for the tests of the WHERE clause. It is almost the same problem for joining in the WHERE clause.

The both last examples seems to be the same. I would prefer the last one because the use of the ON clause in the third example is useful for RIGHT or LEFT JOIN but in this case, it just brings mess in the clauses'use : FROM : get the accurate resource, WHERE : apply constraints.

This explanation is very schematic but I hope it makes sense ...

0人赞添加讨论(0) 举报

What query will be the fastest?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间