Is a JOIN faster than a WHERE?

2019-01-07 15:58发布

问题:

Suppose I have two tables that are linked (one has a foreign key to the other) :

CREATE TABLE Document (
  Id INT PRIMARY KEY,
  Name VARCHAR 255
)

CREATE TABLE DocumentStats (
  Id INT PRIMARY KEY,
  DocumentId INT, -- this is a foreign key to table Document
  NbViews INT
)

I know, this is not the smartest way of doing things, but this is the best example I could come up with.

Now, I want to get all documents that have more than 500 views. The two solutions that come to my mind are :

SELECT *
FROM Document, DocumentStats
WHERE DocumentStats.Id = Document.Id
  AND DocumentStats.NbViews > 500

or :

SELECT *
FROM Document
INNER JOIN DocumentStats
ON Document.Id = DocumentStats.Id
WHERE DocumentStats.NbViews > 500

Are both queries equivalent, or is there one way that is far better than the other ? If so, why ?

I'm aware that my example is not perfect, and that the queries may need some tune-up, but I hope you got the point ;) !

EDIT: as requested in the answers, this question was aimed at MSSQL, but I would be interested in knowing if it is different for other DB engines (MySQL, etc...)

回答1:

Theoretically, no, it shouldn't be any faster. The query optimizer should be able to generate an identical execution plan. However, some DB engines can produce better execution plans for one of them (not likely to happen for such a simple query but for complex enough ones). You should test both and see (on your DB engine).



回答2:

Performance of "JOIN" versus "WHERE"... everything hinges on how well the database engine is able to optimize the query for you. It will take into account any indexes you might have on the columns being returned and consider that performance of WHERE and JOIN clauses also come down to the physical database file itself and its fragmentation level and even the storage technology you use to store the database files on.

MSSql server executes queries in the following order (this should give you an idea of the functions of the WHERE and JOIN clauses)

Microsoft Sql Server query process order

the following is taken from the excellent series of books about Microsoft SQL Server, Inside Microsoft SQL Server 2005: T-SQL Querying which can be found here

(Step 8) SELECT (Step 9) DISTINCT (Step 11)
(Step 1) FROM left_table
(Step 3) join_type JOIN right_table
(Step 2) ON join_condition
(Step 4) WHERE where_condition
(Step 5) GROUP BY group_by_list
(Step 6) WITH [CUBE|ROLLUP]
(Step 7) HAVING having_clause
(Step 10) ORDER BY order_by_list



回答3:

There is no way to correctly answer this without limiting to a target database.

For MS-SQL both queries result in the same execution plans, but keep in mind:

SELECT *
FROM Document, DocumentStats
WHERE DocumentStats.Id = Document.Id
  AND DocumentStats.NbViews > 500

Is really risky since it is easy to forget the join condition in the WHERE clause and end up with a nasty cross join.



回答4:

In MySQL at least, they will both be optimized to the same query.



回答5:

It is a "standard" to use the INNER JOIN syntax, although practically equivalent. The main reason it should be used is for clarity and mobility purposes as it is consistent with OUTER JOIN syntax.



回答6:

When you use Sqlite: The where-syntax is slightly faster because Sqlite first translates the join-syntax into the where-syntax before executing the query.



回答7:

If you're talking specifically about SQL Server, then you should definitely be using the INNER JOIN syntax. Apart from being (personal opinion alert!) easier to read and more clear in intent, there is, as of SQL Server 2005, no equivalent syntax for outer joins. The *= and =* syntax is not supported by default in 2005 -- you need to enable a compatibility mode to support it. It will eventually be removed, possibly as soon as the next release (or possibly not!)

This means:

  • If you need to change a query from inner join to outer join, you need to either rewrite it (argh) or enable compat mode (yuk)
  • Without compat mode, you can't be consistent with how you implement different types of joins (inner vs outer), making for a maintenance nightmare (and, where the two are combined in the one query, some behaviour that's non-intuitive).

Note also that contrary to popular belief, the two are not equivalent. Some things are much more awkward, and some are simply not possible. Kalen Delaney's Inside SQL Server 2000 covers some examples; not sure if the newer editions do, because that join syntax is deprecated anyway.



回答8:

Explicit joins are easier to maintain as the intent of the query is much clearer. Also they are not subject to accidental cross joins so if you have a cross join in the query, the maintainer knows it was intended to be there.

If you ever need to use outer joins, you should know that the *= syntax is deprecated in SQL Server and will soon be removed. Also it currently doesn't function as expected all the time and may not give correct results and thus should NEVER be used. Mixing explicit outer joins and where clause joins (implicit joins) makes a query much harder for a maintainer to read and understand.



回答9:

In MSSQL, both queries are compiled to the same execution plan, so there's no difference. It's more about readability - I think the JOIN is easier to read, so I use that.



回答10:

I guess that it doesn't make a difference too. To be sure you can check if the explain plan of those two queries is identical. In order to look at the explain plan in MySQL you have to put the "explain" keyword before the statement, eg:

EXPLAIN
SELECT *
FROM Document, DocumentStats
WHERE DocumentStats.Id = Document.Id
  AND DocumentStats.NbViews > 500

I'm sure there exists an equivalent in MSSQL too.

By the way: This looks like this is a 1:1 relationship so I'd just include the nbviews attribute directly in the Document table, therefore you can save a join.