Sql NOT IN optimization

2019-09-05 04:05发布

问题:

I'm having trouble optimizing a query. Here are two example tables I am working with:

Table 1:
UID
A
B

Table 2:
UID   Parent
A       2
B       2
C       3
D       2
E       3
F       2

Here is what I am doing now:

Select Table1.UID
FROM Table1 R

INNER JOIN Table2 T ON
R.UID = T.UID

INNER JOIN Table2 E ON
T.PARENT = E.PARENT
AND E.UID NOT IN (SELECT UID FROM Table1)

I'm trying to avoid using the NOT IN clause because of obvious hindrances in performance for large numbers of records.

I know the typical ways to avoid NOT IN clauses like the LEFT JOIN where the other table is null, but can't seem to get what I want with all of the other Joins going on.

I will continue working and post if I find a solution.

EDIT: Here is what I am trying to end up with

After the first Inner Join I would have

A
B

AFter the second Inner join I would have:

A    D 
A    F
B    D
B    F

The second column above is just to represent that it is matching to the other UIDs with the same parent, but I still need the As and Bs as the UID.

EDIT: RDBMS is SQL server 2005, 2008r2, 2012

Table1 is declared in the query with no index

DECLARE @Table1 TABLE ( [UNIQUE_ID] INT PRIMARY KEY )

Table2 has a clustered index on Unique ID

回答1:

The general approach to this is to use a LEFT JOIN with a where clause that only selects the non-matching rows:

Select Table1.UID
FROM Table1 R    
JOIN Table2 T ON R.UID = T.UID
JOIN Table2 E ON T.PARENT = E.PARENT
LEFT JOIN Table3 E2 ON E.UID = R.UID
WHERE E2.UID IS NULL

回答2:

SELECT Table2.*
FROM Table2
INNER JOIN (
  SELECT id FROM Table2
  EXCEPT
  SELECT id FROM Table1
) AS Filter ON (Table2.id = Filter.id)