可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I ran into an interesting SQL problem today and while I came up with a solution that works I doubt it's the best or most efficient answer. I defer to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, query is part of an SSRS report that will run against about 100,000 rows.
Essentially I have a list of IDs that could have multiple values associated with them, the values being Yes, No, or some other string. For ID x, if any of the values are a Yes, x should be Yes, if they are all No, it should be No, if they contain any other values but yes and no, display that value. I only want to return 1 row per ID, no duplicates.
The simplified version and test case:
DECLARE @tempTable table ( ID int, Val varchar(1) )
INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P')
SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val)
FROM @tempTable t
LEFT JOIN
(
SELECT ID, Val
FROM @tempTable
WHERE Val = 'Y'
) t2 ON t.ID = t2.ID
LEFT JOIN
(
SELECT
ID, Val FROM @tempTable
WHERE Val = 'N'
) t3 ON t.ID = t3.ID
LEFT JOIN
(
SELECT ID, Val
FROM @tempTable
WHERE Val <> 'Y' AND Val <> 'N'
) t4 ON t.ID = t4.ID
Thanks in advance.
回答1:
Let's answer an easier problem: for each id, get the Val which is last in the alphabet. This will work if Y and N are the only values. And the query is much simpler:
SELECT t.ID, MAX(t.Val) FROM t GROUP BY t.ID;
So, reduce your case to the simple case. Use an enum (if your DB supports it) or break the value codes into another table with a collation column (in this case, you could have 1 for Y, 2 for N, and 999 for all other possible values, and you want the smallest). Then
SELECT ID, c.Val FROM
(SELECT t.ID, MIN(codes.collation) AS mx
FROM t join codes on t.Val = codes.Val GROUP BY t.ID) AS q
JOIN codes c ON mx=c.collation;
Here codes has two columns, Val and Collation.
You can also do this with a CTE type query, as long as you have the Values ordered as you want them. This approach has one join to a small lookup table and should be much, much faster than 3 self-joins.
WITH q AS (SELECT t.id, t.Val, ROW_NUMBER() AS r FROM t JOIN codes ON t.Val=codes.Val
PARTITION BY t.id ORDER BY codes.collation)
SELECT q.id, q.Val WHERE r=1;
回答2:
I'd change it to this just to make it easier to read:
SELECT DISTINCT t.ID, COALESCE(t2.Val, t3.Val, t4.Val)
FROM @tempTable t
LEFT JOIN @tempTable t2 ON t.ID = t2.ID and t2.Val = 'Y'
LEFT JOIN @tempTable t3 ON t.ID = t3.ID and t3.Val = 'N'
LEFT JOIN @tempTable t4 ON t.ID = t4.ID and t4.Val <> 'Y' AND t4.Val <> 'N'
Gives the same results as your example.
I also looked at the execution plans for both and they looked exactly the same, I doubt you'd see any performance difference.
回答3:
Try this:
;WITH a AS (
SELECT
ID,
SUM(CASE Val WHEN 'Y' THEN 1 ELSE 0 END) AS y,
SUM(CASE Val WHEN 'N' THEN 0 ELSE 1 END) AS n,
MIN(CASE WHEN Val IN ('Y','N') THEN NULL ELSE Val END) AS first_other
FROM @tempTable
GROUP BY ID
)
SELECT
ID,
CASE WHEN y > 0 THEN 'Y' WHEN n = 0 THEN 'N' ELSE first_other END AS Val
FROM a
- If there are any 'Y' values then the sum of y will be greater than 0
- If all values are 'N' then the sum of n will be zero
- Get the first non 'Y' or 'N' character available if needed
- In this case the result can be determined with only one pass through
the table
回答4:
I'm reading your spec like this:
- if any ID is Y then Y
- if all IDs are N then N
- else display value (other than Y or N)
eliminate rows per (1)
delete from @tempTable
where not Val='Y' and ID in (
select distinct ID
from @tempTable
where Val='Y'
)
select distinct to eliminate multiple N's per (2).
select distinct * from @tempTable
group up various "other" values to get a single row per ID.
SELECT A.Id, AllVals =
SubString(
(SELECT ', ' + B.Val
FROM C as B
WHERE A.Id = B.Id
FOR XML PATH ( '' ) ), 3, 1000)
FROM C as A
GROUP BY Id
Entire runnable query:
declare @tempTable table (ID int, Val char(1))
INSERT INTO @tempTable ( ID, Val ) VALUES ( 10, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 11, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 12, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 13, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 14, 'N')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 15, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 16, 'Y')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 17, 'F')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'P')
INSERT INTO @tempTable ( ID, Val ) VALUES ( 18, 'F')
delete from @tempTable
where not Val='Y' and ID in (
select distinct ID
from @tempTable
where Val='Y'
);
WITH C as (select distinct * from @tempTable)
SELECT A.Id, AllVals =
SubString(
(SELECT ', ' + B.Val
FROM C as B
WHERE A.Id = B.Id
FOR XML PATH ( '' ) ), 3, 1000)
FROM C as A
GROUP BY Id
OUTPUT:
Id AllVals
10 Y
11 N
12 Y
13 N
14 Y
15 Y
16 Y
17 F
18 F, P