What is the best way to separate hyphen-delimited

2019-08-29 09:35发布

问题:

I am using SQL Server 2008 R2 and I have data in a format AA-BB-CCCCCCCC-DDDDDDDD-EEEE stored in one column. I need to separate this into 5 separate columns with T-SQL inline (I don't want to create a function for this purpose, though I may look into it if there is a serious performance gain, there are permissions issues here that I would have to deal with). I am creating a view to imitate the layout of a similar table on another server.

So, I'd like my output to look like this:

+------+------+----------+----------+------+
| Col1 | Col2 | Col3     | Col4     | Col5 |
+------+------+----------+----------+------+
| AA   | BB   | CCCCCCCC | DDDDDDDD | EEEE |
+------+------+----------+----------+------+

Now, I have something that works, but it seems entirely inefficient to me and executing queries against this view is quite time consuming in my testing. I use a CTE and XML to break out the columns, but this requires things like escaping the ampersand, etc.

So, what I currently have is this:

WITH cte (ColA, ColB, Colc, etc.)
AS
(
        SELECT 
            CONVERT(XML,'<Account><Attribute>' 
                + REPLACE(REPLACE(MY_COLUMN,'&','&amp;'),'-', '</Attribute><Attribute>') 
                + '</Attribute></Account>') as ACCOUNT_VALUE
)

SELECT
    ACCOUNT_VALUE.value('/Account[1]/Attribute[1]','varchar(2)') as Col1,
    ACCOUNT_VALUE.value('/Account[1]/Attribute[2]','varchar(2)') as Col2,
    ACCOUNT_VALUE.value('/Account[1]/Attribute[3]','varchar(8)') as Col3,
    ACCOUNT_VALUE.value('/Account[1]/Attribute[4]','varchar(8)') as Col4,
    ACCOUNT_VALUE.value('/Account[1]/Attribute[5]','varchar(4)') as Col5
FROM cte

And this returns the data fine, but takes exceptionally long. So, is there a better way to separate hyphenated data into columns within a T-SQL statement (preferably no functions, etc.)? This table has millions of rows that need to be broken out into separate columns.

I got this far via a few hours of Googling yesterday and haven't really found another option I've been able to make work.

回答1:

Try this:

CREATE TABLE Test
(
  LongText VARCHAR(400)
)

INSERT INTO Test (LongText)
VALUES('AA-BB-CCCCCCCC-DDDDDDDD-EEEE'),
('BB-CC-DDDDDDDD-EEEEEEEE-FFFF')

;WITH CTE AS
(
    --initial part
    SELECT LongText, 1 AS ColNo, LEFT(LongText, CHARINDEX('-', LongText)-1) AS Part,
      RIGHT(LongText, LEN(LongText) - CHARINDEX('-', LongText)) AS Remainder
    FROM Test
    WHERE CHARINDEX('-', LongText)>0
    --recursive part, gets 'Part' till the last '-'
    UNION ALL
    SELECT LongText, ColNo + 1 AS ColNo,LEFT(Remainder, CHARINDEX('-', Remainder)-1) AS Part,
      RIGHT(Remainder, LEN(Remainder) - CHARINDEX('-', Remainder)) AS Remainder
    FROM CTE
    WHERE CHARINDEX('-', Remainder)>0
    --recursive part, gets the last 'Part' (there is no '-')
    UNION ALL
    SELECT LongText, ColNo + 1 AS ColNo,Remainder AS Part,NULL AS Remainder
    FROM CTE
    WHERE CHARINDEX('-', Remainder)=0
)
SELECT [1],[2],[3],[4],[5]
FROM (
    SELECT LongText, ColNo, Part
    FROM CTE
  ) AS DT
PIVOT(MAX(Part) FOR ColNo IN ([1],[2],[3],[4],[5])) AS PT

SQL Fiddle



回答2:

If you know that your data has a set length, you can use something like:

Declare @value as Varchar(50);
Set @value = 'AA-BB-CCCCCCCC-DDDDDDDD-EEEE'
Select left(@value, 2) as col1, SUBSTRING(@value, 4, 2) as col2, 
       SUBSTRING (@value, 7, 8) as col3

Etc. Elsewise you could use a similar pattern using results from charindex() on the '-' character to define starting and ending points of a substring. Maciej's approach is similar to this.



回答3:

If you always have 5 parts, this kind of approach might be faster than XML handling:

select 
  left(MY_COLUMN, P1.P1-1) as PART1,
  substring(MY_COLUMN, P1.P1+1,P2.P2-P1.P1-1) as PART2,
  substring(MY_COLUMN, P2.P2+1,P3.P3-P2.P2-1) as PART3,
  substring(MY_COLUMN, P3.P3+1,P4.P4-P3.P3-1) as PART4,
  substring(MY_COLUMN, P4.P4+1,8000) as PART5
from 
  MY_TABLE
  cross apply (select charindex('-', MY_COLUMN) as P1) P1
  cross apply (select charindex('-', MY_COLUMN, P1.P1+1) as P2) P2
  cross apply (select charindex('-', MY_COLUMN, P2.P2+1) as P3) P3
  cross apply (select charindex('-', MY_COLUMN, P3.P3+1) as P4) P4
  cross apply (select charindex('-', MY_COLUMN, P4.P4+1) as P5) P5