I have a case where using a JOIN or an IN will give me the correct results... Which typically has better performance and why? How much does it depend on what database server you are running? (FYI I am using MSSQL)
相关问题
- SQL join to get the cartesian product of 2 columns
- sql execution latency when assign to a variable
- Difference between Types.INTEGER and Types.NULL in
- What is the best way to cache a table from a (SQL)
- php PDO::FETCH_ASSOC doesnt detect select after ba
Generally speaking,
IN
andJOIN
are different queries that can yield different results.is not the same as
, unless
b.col
is unique.However, this is the synonym for the first query:
If the joining column is
UNIQUE
and marked as such, both these queries yield the same plan inSQL Server
.If it's not, then
IN
is faster thanJOIN
onDISTINCT
.See this article in my blog for performance details:
IN
vs.JOIN
vs.EXISTS
The optimizer should be smart enough to give you the same result either way for normal queries. Check the execution plan and they should give you the same thing. If they don't, I would normally consider the JOIN to be faster. All systems are different, though, so you should profile the code on your system to be sure.
Funny you mention that, I did a blog post on this very subject.
See Oracle vs MySQL vs SQL Server: Aggregation vs Joins
Short answer: you have to test it and individual databases vary a lot.
A interesting writeup on the logical differences: SQL Server: JOIN vs IN vs EXISTS - the logical difference
I am pretty sure that assuming that the relations and indexes are maintained a Join will perform better overall (more effort goes into working with that operation then others). If you think about it conceptually then its the difference between 2 queries and 1 query.
You need to hook it up to the Query Analyzer and try it and see the difference. Also look at the Query Execution Plan and try to minimize steps.
This Thread is pretty old but still mentioned often. For my personal taste it is a bit incomplete, because there is another way to ask the database with the EXISTS keyword which I found to be faster more often than not.
So if you are only interested in values from table a you can use this query:
The difference might be huge if col is not indexed, because the db does not have to find all records in b which have the same value in col, it only has to find the very first one. If there is no index on b.col and a lot of records in b a table scan might be the consequence. With IN or a JOIN this would be a full table scan, with EXISTS this would be only a partial table scan (until the first matching record is found).
If there a lots of records in b which have the same col value you will also waste a lot of memory for reading all these records into a temporary space just to find that your condition is satisfied. With exists this can be usually avoided.
I have often found EXISTS faster then IN even if there is an index. It depends on the database system (the optimizer), the data and last not least on the type of index which is used.
That's rather hard to say - in order to really find out which one works better, you'd need to actually profile the execution times.
As a general rule of thumb, I think if you have indices on your foreign key columns, and if you're using only (or mostly) INNER JOIN conditions, then the JOIN will be slightly faster.
But as soon as you start using OUTER JOIN, or if you're lacking foreign key indexes, the IN might be quicker.
Marc