CREATE TABLE y (Id INT NOT NULL PRIMARY KEY );
CREATE TABLE x (YId INT NOT NULL REFERENCES y (Id));
DECLARE @YId INT = …;
If I wanted to select all records in x
where YId
is equal to @YId
, I could to this three different ways:
/* 1: */ SELECT x.* FROM x WHERE x.YId = @YId;
/* 2: */ SELECT x.* FROM x INNER JOIN y ON x.YId = @YId;
/* 3: */ SELECT x.* FROM x INNER JOIN y ON x.YId = y.Id WHERE y.Id = @YId;
Query 1 is obviously the most straight-forward query, and the one with the best performance. (See the execution plan further below.) Query 3 might be more complicated than strictly necessary, but is perhaps also a fairly common solution.
What surprises me is query 2 (which, according to the execution plan, gets executed just like query 3): Its INNER JOIN
clause does not depend on y
at all!
Question: Why does SQL Server even accept this seemingly nonsensical join condition as valid? Are there any specific reasons or situations that explain why the ON
condition is not required to refer to the joined table?
Execution plans:
SQL Server enables it, because the JOIN
syntax is the following (from MSDN):
<joined_table> ::=
{
<table_source> <join_type> <table_source> ON <search_condition>
| <table_source> CROSS JOIN <table_source>
| left_table_source { CROSS | OUTER } APPLY right_table_source
| [ ( ] <joined_table> [ ) ]
}
<join_type> ::=
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN
Where the <search_condition>
is the following:
Specifies the condition on which the join is based. The condition can specify any predicate, although columns and comparison operators are
frequently used.
When the condition specifies columns, the columns do not have to have the same name or same data type; however, if the data types are not
the same, they must be either compatible or types that SQL Server can
implicitly convert. If the data types cannot be implicitly converted,
the condition must explicitly convert the data type by using the
CONVERT function.
There can be predicates that involve only one of the joined tables in the ON clause. Such predicates also can be in the WHERE clause in
the query. Although the placement of such predicates does not make a
difference for INNER joins, they might cause a different result when
OUTER joins are involved. This is because the predicates in the ON
clause are applied to the table before the join, whereas the WHERE
clause is semantically applied to the result of the join.
For more information about search conditions and predicates, see Search Condition (Transact-SQL).
The important part is highlighted with bold.
Here is an example:
SELECT
U.UserName, P.PrivilegeName, P.Status
FROM
dbo.Users U
LEFT JOIN dbo.Privileges P
ON U.UserID = P.UserID AND P.PrivilegeName = 'EditComment'
This query lists all users and adds the status of the 'EditComment' privilege. If you move the condition to the WHERE clause, the result is different. (Only users with the 'EditComment' privilege will be listed.)
The above query will list all users with NULL
in the Status
and PrivilegeName
columns in the result if there is no matching row.
The same result could be achived with a subquery:
SELECT
U.UserName, P.PrivilegeName, P.Status
FROM
dbo.Users U
LEFT JOIN (SELECT UserID, Status FROM dbo.Privileges WHERE P.PrivilegeName = 'EditComment') P
ON U.UserID = P.UserID
So this is some kind of shortand of the subquery approach.
Moving the condition to the WHERE
clause
SELECT
U.UserName, P.PrivilegeName, P.Status
FROM
dbo.Users U
LEFT JOIN dbo.Privileges P
ON U.UserID = P.UserID
WHERE
P.PrivilegeName = 'EditComment'
If the condition is in the WHERE
clause, the result will be filtered to that rows which has a mathing row in the Privileges table and that row's PrivilegeName
is EditComment
. This is basically an INNER JOIN
. (Filtering to a column in the WHERE
clause which is in a LEFT JOIN
ed table filters all NULL
values from the result - except if there is an OR
condition with IS NULL
)
Execution Plans
The difference in the execution plans are because of the logical processing of the SELECT queries.
FROM, ON, JOIN, WHERE, GROUP BY, WITH CUBE or WITH ROLLUP, HAVING, SELECT, DISTINCT, ORDER BY, TOP
JOIN
is processed before WHERE
.
Change the INNER JOIN
in your query to LEFT
or any OUTER
join and analyze the query. The results could be different.
EDIT
The second query looks like a filtered cross-join:
/* 4: */ SELECT x.* FROM x INNER JOIN y ON 1=1;
/* 5: */ SELECT x.* FROM x CROSS JOIN y
When you change the ON condition in the 4th query to your original one in the 2nd query, that equals to
/* 6: */ SELECT x.* FROM x CROSS JOIN y WHERE x.YId = @YId
Why does SQL Server even accept this seemingly nonsensical join condition as valid?
Because it is valid (it has a different meaning, but syntactically it is valid)
Are there any specific reasons or situations that explain why the ON condition is not required to refer to the joined table?
Reason? yes, it is syntactically valid
Situation: CROSS JOIN with filtering for example. Join 1 row from a table to all records in the resultset (e.g. the record of the current date from a point-in-time/date/calendar table for reporting purposes)
trying to answer, shortly, and repeating @Pred answer
I think the answer is, imho, yes. It allows shorcut for subquery.
2 / SELECT x.* FROM x INNER JOIN y ON x.YId = @YId
is logically equivalent to
select
tx.*
from
(
select x.* from x where x.YId = @YId
) tx
cross join y
this allows to reduce the number of explored rows from x * y.
as an illustration :
declare @ty table (Id int not null primary key)
declare @tx table (Id Char(1) not null primary key, yId int not null)
insert into @ty values (1) ,(2), (3)
insert into @tx values ('A', 1), ('B', 1), ('C', 2)
declare @YId int = 1
SELECT x.* FROM @tx x WHERE x.YId = @YId
SELECT x.* FROM @tx x INNER JOIN @ty y ON x.yId = @YId
SELECT x.* FROM @tx x INNER JOIN @ty y ON x.YId = y.Id WHERE y.Id = @YId
will give
A 1
B 1
A 1
A 1
A 1
B 1
B 1
B 1
A 1
B 1
yes a distinct will give the same results.
On the other end: why a join if you neither need to return or test from the joined table ?
Isn't the real question, or a reformulated question: why the query optimizer isn't more smart ?
In an nutshell the answer is that ISO Standard SQL is a strange beast. Microsoft is just following the standard that the IT industry has stuck doggedly to for the last 3 decades.
SQL Server allows JOIN conditions that don't depend on the joined table at all
That's for full cross join :
insert into y values(1),(2);
insert into x values(1),(2);
Result of query 2:
YID
1
1
Query 1 and 3 must get the same result, but sql server is not clever enough to recognize it. That's why DBAs living for.