T-SQL Grouping rows from the MAX length columns in

2019-04-11 15:09发布

问题:

i'm trying to come up with a way to combine rows in a table based on the longest string in any of the rows based on a row key

example

CREATE TABLE test1 
    (akey int not null , 
    text1 varchar(50) NULL, 
    text2 varchar(50) NULL, 
    text3 varchar(50) NULL  )


INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley',NULL)
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Road','crawley estate','P21869')
INSERT INTO test1 VALUES ( 1,'Winchester Rd','crawley','P21869A')
INSERT INTO test1 VALUES ( 2,'','birmingham','P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North East','P53342')
INSERT INTO test1 VALUES ( 2,'Smith Cl.',NULL,'P53342B')
INSERT INTO test1 VALUES ( 2,'Smith Close','birmingham North','P53342')

with these rows i would be looking for the result of :

1   Winchester Road,    crawley estate, P21869A
2   Smith Close,    birmingham North East,  P53342B

EDIT: the results above need to be in a table rather than just a comma separated string

as you can see in the result, the output should be the longest text column in the range of the 'akey' field.

i'm trying to come up with a solution that does not involve lots of subqueries on each column, the actual table has 32 columns and over 13 million rows.

the reason i'm doing this is to create a cleaned-up table that has the best results in each column for just one ID per row

this is my first post, so let me know if you need any more info, and i'm happy to hear about any best practices about posting that i've broken!

thanks

Ben.

回答1:

SELECT A.akey, 
    (
        SELECT TOP 1 T1.text1
        FROM test1 T1
        WHERE T1.akey=A.akey AND LEN(T1.TEXT1) = MAX(LEN(A.text1))
    ) AS TEXT1,
    (
        SELECT TOP 1 T2.text2
        FROM test1 T2
        WHERE T2.akey=A.akey AND LEN(T2.TEXT2) = MAX(LEN(A.text2))
    ) AS TEXT2,
    (
        SELECT TOP 1 T3.text3
        FROM test1 T3
        WHERE T3.akey=A.akey AND LEN(T3.TEXT3) = MAX(LEN(A.text3))
    ) AS TEXT3
FROM TEST1 AS A
GROUP BY A.akey

I just realized you said you have 32 columns. I don't see a good way to do that, unless UNPIVOT would allow you to create separate rows (akey, textn) for each text* column.

Edit: I may not have a chance to finish this today, but UNPIVOT looks useful:

;
WITH COLUMNS AS
(
    SELECT akey, [Column], ColumnValue
    FROM
        (
            SELECT X.Akey, X.Text1, X.Text2, X.Text3
            FROM test1 X
        ) AS p
    UNPIVOT (ColumnValue FOR [Column] IN (Text1, Text2, Text3))
    AS UNPVT
)
SELECT *
FROM COLUMNS
ORDER BY akey,[Column], LEN(ColumnValue)


回答2:

This seems really ugly, but at least works (on SQL2K) and doesn't need subqueries:

select test1.akey, A.text1, B.text2, C.text3
from test1
inner join test1 A on A.akey = test1.akey 
inner join test1 B on B.akey = test1.akey 
inner join test1 C on C.akey = test1.akey 
group by test1.akey, A.text1, B.text2, C.text3
having len(a.text1) = max(len(test1.text1))
   and len(B.text2) = max(len(test1.text2))
   and len(C.text3) = max(len(test1.text3))
order by test1.akey

I must admit that it needs an inner join for each column and I wonder how this could impact on the 32 columns x 13millions record table... I try both this approach and the one based one subqueries and looked at executions plans: I'ld actually be curious to know