Are there any reasons why SQL Server allows JOIN c

2019-08-05 03:45发布

问题:

CREATE TABLE y (Id  INT NOT NULL PRIMARY KEY      );
CREATE TABLE x (YId INT NOT NULL REFERENCES y (Id));
DECLARE @YId INT = …;

If I wanted to select all records in x where YId is equal to @YId, I could to this three different ways:

/* 1: */  SELECT x.* FROM x                              WHERE x.YId = @YId;
/* 2: */  SELECT x.* FROM x INNER JOIN y ON                    x.YId = @YId;
/* 3: */  SELECT x.* FROM x INNER JOIN y ON x.YId = y.Id WHERE y.Id  = @YId;

Query 1 is obviously the most straight-forward query, and the one with the best performance. (See the execution plan further below.) Query 3 might be more complicated than strictly necessary, but is perhaps also a fairly common solution.

What surprises me is query 2 (which, according to the execution plan, gets executed just like query 3): Its INNER JOIN clause does not depend on y at all!

Question: Why does SQL Server even accept this seemingly nonsensical join condition as valid? Are there any specific reasons or situations that explain why the ON condition is not required to refer to the joined table?

Execution plans:

回答1:

SQL Server enables it, because the JOIN syntax is the following (from MSDN):

<joined_table> ::= 
{
    <table_source> <join_type> <table_source> ON <search_condition> 
    | <table_source> CROSS JOIN <table_source> 
    | left_table_source { CROSS | OUTER } APPLY right_table_source 
    | [ ( ] <joined_table> [ ) ] 
}
<join_type> ::= 
    [ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
    JOIN

Where the <search_condition> is the following:

Specifies the condition on which the join is based. The condition can specify any predicate, although columns and comparison operators are frequently used.

When the condition specifies columns, the columns do not have to have the same name or same data type; however, if the data types are not the same, they must be either compatible or types that SQL Server can implicitly convert. If the data types cannot be implicitly converted, the condition must explicitly convert the data type by using the CONVERT function.

There can be predicates that involve only one of the joined tables in the ON clause. Such predicates also can be in the WHERE clause in the query. Although the placement of such predicates does not make a difference for INNER joins, they might cause a different result when OUTER joins are involved. This is because the predicates in the ON clause are applied to the table before the join, whereas the WHERE clause is semantically applied to the result of the join.

For more information about search conditions and predicates, see Search Condition (Transact-SQL).

The important part is highlighted with bold.

Here is an example:

SELECT
  U.UserName, P.PrivilegeName, P.Status
FROM
 dbo.Users U
LEFT JOIN dbo.Privileges P
  ON U.UserID = P.UserID AND P.PrivilegeName = 'EditComment'

This query lists all users and adds the status of the 'EditComment' privilege. If you move the condition to the WHERE clause, the result is different. (Only users with the 'EditComment' privilege will be listed.)

The above query will list all users with NULL in the Status and PrivilegeName columns in the result if there is no matching row.

The same result could be achived with a subquery:

SELECT
  U.UserName, P.PrivilegeName, P.Status
FROM
 dbo.Users U
LEFT JOIN (SELECT UserID, Status FROM dbo.Privileges WHERE P.PrivilegeName = 'EditComment') P
  ON U.UserID = P.UserID

So this is some kind of shortand of the subquery approach.

Moving the condition to the WHERE clause

SELECT
  U.UserName, P.PrivilegeName, P.Status
FROM
 dbo.Users U
LEFT JOIN dbo.Privileges P
  ON U.UserID = P.UserID
WHERE
  P.PrivilegeName = 'EditComment'

If the condition is in the WHERE clause, the result will be filtered to that rows which has a mathing row in the Privileges table and that row's PrivilegeName is EditComment. This is basically an INNER JOIN. (Filtering to a column in the WHERE clause which is in a LEFT JOINed table filters all NULL values from the result - except if there is an OR condition with IS NULL)

Execution Plans

The difference in the execution plans are because of the logical processing of the SELECT queries. FROM, ON, JOIN, WHERE, GROUP BY, WITH CUBE or WITH ROLLUP, HAVING, SELECT, DISTINCT, ORDER BY, TOP

JOIN is processed before WHERE.

Change the INNER JOIN in your query to LEFT or any OUTER join and analyze the query. The results could be different.

EDIT

The second query looks like a filtered cross-join:

/* 4: */ SELECT x.* FROM x INNER JOIN y ON                    1=1;
/* 5: */ SELECT x.* FROM x CROSS JOIN y

When you change the ON condition in the 4th query to your original one in the 2nd query, that equals to

/* 6: */ SELECT x.* FROM x CROSS JOIN y WHERE x.YId = @YId

Why does SQL Server even accept this seemingly nonsensical join condition as valid?

Because it is valid (it has a different meaning, but syntactically it is valid)

Are there any specific reasons or situations that explain why the ON condition is not required to refer to the joined table?

Reason? yes, it is syntactically valid

Situation: CROSS JOIN with filtering for example. Join 1 row from a table to all records in the resultset (e.g. the record of the current date from a point-in-time/date/calendar table for reporting purposes)



回答2:

trying to answer, shortly, and repeating @Pred answer

I think the answer is, imho, yes. It allows shorcut for subquery.

2 / SELECT x.* FROM x INNER JOIN y ON                    x.YId = @YId

is logically equivalent to

select
    tx.*
from
    (
        select x.* from x where x.YId = @YId
    ) tx
    cross join y

this allows to reduce the number of explored rows from x * y.

as an illustration :

declare @ty table (Id int not null primary key)
declare @tx table (Id Char(1) not null primary key, yId int not null)

insert into @ty values (1) ,(2), (3)
insert into @tx values ('A', 1), ('B', 1), ('C', 2)

declare @YId int = 1

SELECT x.* FROM @tx x                             WHERE x.YId = @YId
SELECT x.* FROM @tx x INNER JOIN @ty y ON   x.yId = @YId
SELECT x.* FROM @tx x INNER JOIN @ty y ON x.YId = y.Id WHERE y.Id  = @YId

will give

A   1
B   1

A   1
A   1
A   1
B   1
B   1
B   1

A   1
B   1

yes a distinct will give the same results.

On the other end: why a join if you neither need to return or test from the joined table ?

Isn't the real question, or a reformulated question: why the query optimizer isn't more smart ?



回答3:

In an nutshell the answer is that ISO Standard SQL is a strange beast. Microsoft is just following the standard that the IT industry has stuck doggedly to for the last 3 decades.



回答4:

SQL Server allows JOIN conditions that don't depend on the joined table at all

That's for full cross join :

insert into y values(1),(2);
insert into x values(1),(2);

Result of query 2:

YID
1
1

Query 1 and 3 must get the same result, but sql server is not clever enough to recognize it. That's why DBAs living for.