How can I improve this SQL query?

2019-04-09 17:42发布

问题:

I ran into an interesting SQL problem today and while I came up with a solution that works I doubt it's the best or most efficient answer. I defer to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, query is part of an SSRS report that will run against about 100,000 rows.

Essentially I have a list of IDs that could have multiple values associated with them, the values being Yes, No, or some other string. For ID x, if any of the values are a Yes, x should be Yes, if they are all No, it should be No, if they contain any other values but yes and no, display that value. I only want to return 1 row per ID, no duplicates.

The simplified version and test case:

DECLARE @tempTable table ( ID int, Val varchar(1) )

INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P')


SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val)
FROM @tempTable t
LEFT JOIN
(
    SELECT ID, Val
    FROM @tempTable
    WHERE Val = 'Y'
) t2 ON t.ID = t2.ID
LEFT JOIN
(
    SELECT 
    ID, Val FROM @tempTable
    WHERE Val = 'N'
) t3 ON t.ID = t3.ID
LEFT JOIN
(
    SELECT ID, Val
    FROM @tempTable
    WHERE Val <> 'Y' AND Val <> 'N'
) t4 ON t.ID = t4.ID

Thanks in advance.

回答1:

Let's answer an easier problem: for each id, get the Val which is last in the alphabet. This will work if Y and N are the only values. And the query is much simpler:

SELECT t.ID, MAX(t.Val) FROM t GROUP BY t.ID;

So, reduce your case to the simple case. Use an enum (if your DB supports it) or break the value codes into another table with a collation column (in this case, you could have 1 for Y, 2 for N, and 999 for all other possible values, and you want the smallest). Then

SELECT ID, c.Val FROM
     (SELECT t.ID, MIN(codes.collation) AS mx
      FROM t join codes on t.Val = codes.Val GROUP BY t.ID) AS q
JOIN codes c ON mx=c.collation;

Here codes has two columns, Val and Collation.

You can also do this with a CTE type query, as long as you have the Values ordered as you want them. This approach has one join to a small lookup table and should be much, much faster than 3 self-joins.

WITH q AS (SELECT t.id, t.Val, ROW_NUMBER() AS r FROM t JOIN codes ON t.Val=codes.Val 
    PARTITION BY t.id ORDER BY codes.collation)
SELECT q.id, q.Val WHERE r=1;            


回答2:

I'd change it to this just to make it easier to read:

SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val)
FROM @tempTable t
LEFT JOIN @tempTable t2 ON t.ID = t2.ID and t2.Val = 'Y'
LEFT JOIN @tempTable t3 ON t.ID = t3.ID and  t3.Val = 'N'
LEFT JOIN @tempTable t4 ON t.ID = t4.ID and t4.Val <> 'Y' AND t4.Val <> 'N'

Gives the same results as your example.

I also looked at the execution plans for both and they looked exactly the same, I doubt you'd see any performance difference.



回答3:

Try this:

;WITH a AS ( 
SELECT
  ID,
  SUM(CASE Val WHEN 'Y' THEN 1 ELSE 0 END) AS y,
  SUM(CASE Val WHEN 'N' THEN 0 ELSE 1 END) AS n,
  MIN(CASE WHEN Val IN ('Y','N') THEN NULL ELSE Val END) AS first_other
FROM @tempTable
GROUP BY ID
) 
SELECT
  ID,
  CASE WHEN y > 0 THEN 'Y' WHEN n = 0 THEN 'N' ELSE first_other END AS Val
FROM a 
  • If there are any 'Y' values then the sum of y will be greater than 0
  • If all values are 'N' then the sum of n will be zero
  • Get the first non 'Y' or 'N' character available if needed
  • In this case the result can be determined with only one pass through the table


回答4:

I'm reading your spec like this:

  1. if any ID is Y then Y
  2. if all IDs are N then N
  3. else display value (other than Y or N)

eliminate rows per (1)

delete from @tempTable
where not Val='Y' and ID in (
    select distinct ID
    from @tempTable
    where Val='Y'
)

select distinct to eliminate multiple N's per (2).

select distinct * from @tempTable

group up various "other" values to get a single row per ID.

SELECT A.Id, AllVals = 
    SubString(
        (SELECT ', ' + B.Val 
         FROM C as B 
         WHERE A.Id = B.Id 
         FOR XML PATH ( '' ) ), 3, 1000) 
FROM C as A 
GROUP BY Id

Entire runnable query:

declare @tempTable table (ID int, Val char(1))
INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F') 
INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'F')
delete from @tempTable
where not Val='Y' and ID in (
    select distinct ID
    from @tempTable
    where Val='Y'
);
WITH C as (select distinct * from @tempTable)
SELECT A.Id, AllVals = 
    SubString(
        (SELECT ', ' + B.Val 
         FROM C as B 
         WHERE A.Id = B.Id 
         FOR XML PATH ( '' ) ), 3, 1000) 
FROM C as A 
GROUP BY Id

OUTPUT:

Id  AllVals
10  Y
11  N
12  Y
13  N
14  Y
15  Y
16  Y
17  F
18  F, P