SQL query: how to translate IN() into a JOIN?

I have a lot of SQL queries like this:

SELECT o.Id, o.attrib1, o.attrib2 
  FROM table1 o 
WHERE o.Id IN (
                SELECT DISTINCT Id 
                  FROM table1
                     , table2
                     , table3 
                 WHERE ...
               )

These queries have to run on different database engines (MySql, Oracle, DB2, MS-Sql, Hypersonic), so I can only use common SQL syntax.

Here I read, that with MySql the IN statement isn't optimized and it's really slow, so I want to switch this into a JOIN.

I tried:

SELECT o.Id, o.attrib1, o.attrib2 
  FROM table1 o, table2, table3 
  WHERE ...

But this does not take into account the DISTINCT keyword.

Question: How do I get rid of the duplicate rows using the JOIN approach?

标签： sql mysql oracle db2 performance

4条回答

闹够了就滚

2楼-- · 2019-05-11 00:42

But this does not take into account the DISTINCT keyword.

You do not need the distinct in the sub-query. The in will return one row in the outer query regardless of whether it matches one row or one hundred rows in the sub-query. So, if you want to improve the performance of the query, junking that distinct would be a good start.

One way of tuning in clauses is to rewrite them using exists instead. Depending on the distribution of data this may be a lot more efficient, or it may be slower. With tuning, the benchmark is king.

SELECT o.Id, o.attrib1, o.attrib2 
FROM table1 o 
WHERE EXISTS (
  SELECT  Id FROM table1 t1, table2 t2, table3 t3 WHERE ... 
  AND  ( t1.id = o.id 
         or t2.id = o.id 
         or t3.id = o.id 
)

Not knowing your business logic the precise formulation of that additional filter may be wrong.

Incidentally I notice that you have table1 in both the outer query and the sub-query. If that is not a mistake in transcribing your actual SQL to here you may want to consider whether that makes sense. It would be better to avoid querying that table twice; using exists make make it easier to avoid the double hit.

0人赞添加讨论(0) 举报

贼婆χ

3楼-- · 2019-05-11 00:56

SELECT DISTINCT o.Id, o.attrib1, o.attrib2 
  FROM table1 o, table2, table3 
 WHERE ...

Though if you need to support a number of different database back ends you probably want to give each its own set of repository classes in your data layer, so you can optimize your queries for each. This also gives you the power to persist in other types of databases, or xml, or web services, or whatever should the need arise down the road.

0人赞添加讨论(0) 举报

欢心

4楼-- · 2019-05-11 01:00

I'm not sure to really understand what is your problem. Why don't you try this :

SELECT distinct o.Id, o.attrib1, o.attrib2
FROM
table1 o
, table o1
, table o2
...
where
o1.id1 =  o.id
or o2.id = o.id

0人赞添加讨论(0) 举报

一纸荒年 Trace。

5楼-- · 2019-05-11 01:07

To write this with a JOIN you can use an inner select and join with that:

SELECT o.Id, o.attrib1, o.attrib2 FROM table1 o
JOIN (
  SELECT DISTINCT Id FROM table1, table2, table3 WHERE ...
) T1
ON o.id = T1.Id

I'm not sure this will be much faster, but maybe... you can try it for yourself.

In general restricting yourself only to SQL that will work on multiple databases is not going to result in the best performance.

0人赞添加讨论(0) 举报

SQL query: how to translate IN() into a JOIN?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间