Why does EXCEPT exist in T-SQL?

2019-01-22 12:36发布

问题:

I was just reading about EXCEPT and INTERSECT in the MSDN Library and came across this example of how to use INTERSECT:

USE AdventureWorks2008R2 GO 
SELECT ProductID 
FROM Production.Product
INTERSECT
SELECT ProductID 
FROM Production.WorkOrder ;
--Result: 238 Rows (products that have work orders)

Maybe I'm old-fashioned, but I typically would use the following code to achieve the same result:

SELECT P.ProductID 
FROM Production.Product P 
INNER JOIN Production.WorkOrder W ON W.ProductID = P.ProductID

Am I missing something, or is INTERSECT the same as INNER JOIN? Is there a performance benefit to using one over the other?

Same question for EXCEPT. How is this:

USE AdventureWorks2008R2;
GO
SELECT ProductID 
FROM Production.Product
EXCEPT
SELECT ProductID 
FROM Production.WorkOrder ;
--Result: 266 Rows (products without work orders)

different from this:

SELECT P.ProductID 
FROM Production.Product P 
LEFT JOIN Production.WorkOrder W ON W.ProductID = P.ProductID
WHERE W.ProductID IS NULL

?

回答1:

I'm going to focus on EXCEPT just because I'm more familiar with it. Also, as a disclaimer, my examples will be in Sqlite, since I'm on a Linux box. However, both Sqlite and SQL Server should support the functionality.

Both INTERSECT and EXCEPT are set operators, stemming from the underlying ideas in relational algebra. They operate on distinct values, being set operators.

Your example is simplistic. I'll give a counterexample, using a Sqlite version of the Northwind sample database.

Let's say that you want to get the CustomerIDs of all customers who made an order with EmployeeID of 5, but NOT those who also made an order with EmployeeID of 6. This is simple and natural with an EXCEPT.

SELECT CustomerID FROM orders
WHERE EmployeeID = 5
EXCEPT
SELECT CustomerID FROM orders
WHERE EmployeeID = 6

This returns 14 rows on my version of Northwind.

Suppose you decide to rewrite this using JOINs. Maybe something like this?

SELECT o1.CustomerID
FROM orders o1 INNER JOIN orders o2 ON o1.CustomerID = o2.CustomerID
WHERE o1.EmployeeID = 5 AND o2.EmployeeID != 6

Whoops, 525 rows. Maybe add a DISTINCT?

SELECT DISTINCT o1.CustomerID
FROM orders o1 INNER JOIN orders o2 ON o1.CustomerID = o2.CustomerID
WHERE o1.EmployeeID = 5 AND o2.EmployeeID != 6

Now it's 28 rows, still much more than what we were getting with EXCEPT. The reason is that this isn't removing CustomerIDs that have made an order with 6. Rather, it returns all CustomerIDs that have an order with 5 and some EmployeeID other than 6, whether or not they also have an order with EmployeeID 6.

In short, EXCEPT and INTERSECT are set operators that compare two queries, returning unique tuples, and certainly have their use.



回答2:

  • INTERSECT and EXCEPT are semi-joins
  • JOIN is equi-join

So when you join 2 tables that match, say, 5 rows and 3 rows

  • JOIN gives 15 rows
  • INTERSECT gives 3 rows

EXCEPT is similar to OUTER JOIN for the same reason

While we're on about semi-joins, then mostly

  • INTERSECT gives the same results as EXISTS
  • EXCEPT gives the same results as NOT EXISTS

The "mostly" comes because both INTERSECT and EXCEPT

  • treat NULL differently: For a fully worked example, see this by Paul White aka SQL Kiwi
  • apply DISTINCT

Edit, Quick demo of all this

DECLARE @t1 TABLE (t1col INT);
INSERT @t1 VALUES (1), (2), (2), (3), (3), (5), (5);

DECLARE @t2 TABLE (t2col INT);
INSERT @t2 VALUES (1), (2), (3), (4);

SELECT 'INNER JOIN', * FROM @t1 t1 JOIN @t2 t2 ON t1.t1col = t2.t2col -- same both ways

SELECT 't1 INTERSECT t2', * FROM @t1 INTERSECT SELECT 't1 INTERSECT t2', * FROM @t2;

SELECT 't2 INTERSECT t1', * FROM @t2 INTERSECT SELECT 't2 INTERSECT t1', * FROM @t1;

SELECT 't1 EXISTS t2', * FROM @t1 t1
WHERE EXISTS (SELECT * FROM @t2 t2 WHERE t1.t1col = t2.t2col);

SELECT 't2 EXISTS t1', * FROM @t2 t2
WHERE EXISTS (SELECT * FROM @t1 t1 WHERE t1.t1col = t2.t2col);

SELECT 't1 LEFT JOIN t2, IS NULL', * FROM @t1 t1 LEFT JOIN @t2 t2 ON t1.t1col = t2.t2col WHERE t2.t2col IS NULL
SELECT 't2 LEFT JOIN t1, IS NULL', * FROM @t2 t2 LEFT JOIN @t1 t1 ON t1.t1col = t2.t2col WHERE t1.t1col IS NULL

SELECT 't1 EXCEPT t2', * FROM @t1 EXCEPT SELECT 't1 EXCEPT t2', * FROM @t2;

SELECT 't2 EXCEPT t1', * FROM @t2 EXCEPT SELECT 't2 EXCEPT t1', * FROM @t1;

SELECT 't1 NOT EXISTS t2', * FROM @t1 t1
WHERE NOT EXISTS (SELECT * FROM @t2 t2 WHERE t1.t1col = t2.t2col);

SELECT 't2 NOT EXISTS t1', * FROM @t2 t2
WHERE NOT EXISTS (SELECT * FROM @t1 t1 WHERE t1.t1col = t2.t2col);

Update: Feb 2013. Added extra column to describe the operation



回答3:

Your examples of your "equivalent" queries are wrong - the query with INTERSECT not always return the same result as INNER JOIN and the same for EXCEPT and LEFT JOIN.

Look at particular example about INTERSECT:

DECLARE @t TABLE(t INT NOT NULL)
DECLARE @x TABLE(x INT NOT NULL)

INSERT @t
VALUES (1), (2), (3)

INSERT @x VALUES(1), (1), (1)

SELECT t FROM @t
INTERSECT SELECT x FROM @x

SELECT t FROM @t
INNER JOIN @x ON x = t

INTERSECT is more like (but not the same) as IN clause:

SELECT t FROM @t
WHERE t IN (select x FROM @x)

or as EXISTS

SELECT t FROM @t
WHERE EXISTS (select * FROM @x WHERE x = t)

The same examples you may adapt to EXCEPT clause.



回答4:

In my opinion EXCEPT and INTERSECT are used to do the same things as the JOIN command, but it simpler with tables that don't have Primary Keys, for example:

with INTERSECT:

SELECT FIRSTNAME,
       LASTNAME,
       ADDRESSLINE1,
       CITY,
       STATEPROVINCECODE,
       POSTALCODE
FROM   MANAGER 
EXCEPT
SELECT FIRSTNAME,
       LASTNAME,
       ADDRESSLINE1,
       CITY,
       STATEPROVINCECODE,
       POSTALCODE
FROM   CUSTOMER

And to have the same results with JOIN, you must do:

SELECT   M.FIRSTNAME,
         M.LASTNAME,
         M.ADDRESSLINE1,
         M.CITY,
         M.STATEPROVINCECODE,
         M.POSTALCODE
FROM     MANAGER M
WHERE    NOT EXISTS (SELECT *
                     FROM   CUSTOMER C
                     WHERE  M.FIRSTNAME = C.FIRSTNAME
                            AND M.LASTNAME = C.LASTNAME
                            AND M.ADDRESSLINE1 = C.ADDRESSLINE1
                            AND M.CITY = C.CITY
                            AND M.POSTALCODE = C.POSTALCODE)
GROUP BY M.FIRSTNAME,M.LASTNAME,M.ADDRESSLINE1,M.CITY,
         M.STATEPROVINCECODE,M.POSTALCODE 

More information here.



标签: tsql