split string in column

2019-06-08 16:21发布

问题:

I have data that has come over from a hierarchical database, and it often has columns that contain data that SHOULD be in another table, if the original database had been relational.

The column's data is formatted in pairs, with LABEL\VALUE with a space as the delimiter, like this:

LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE

There is seldom more than one pair in a record, but there as many as three. There are 24 different possible Labels. There are other columns in this table, including the ID. I have been able to convert this column into a sparse array without using a cursor, with columns for ID, LABEL1, LABEL2, etc....

But this is not ideal for using in another query. My other option it to use a cursor, loop through the entire table once and write to a temp table, but I can't see to get it to work the way I want. I have been able to do it in just a few minutes in VB.NET, using a couple of nested loops, but can't manage to do it in T-SQL even using cursors. Problem is, that I would have to remember to run this program every time before I want to use the table it creates. Not ideal.

So, I read a row, split out the pairs from 'LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE' into an array, then split them out again, then write the rows

ID, LABEL1, VALUE

ID, LABEL2, VALUE

ID, LABEL3, VALUE

etc...

I realize that 'splitting' the strings here is the hard part for SQL to do, but it just seems a lot more difficult that it needs to be. What am I missing?

回答1:

Assuming that the data label contains no . characters, you can use a simple function for this:

CREATE FUNCTION [dbo].[SplitGriswold]
(
  @List   NVARCHAR(MAX),
  @Delim1 NCHAR(1),
  @Delim2 NCHAR(1)
)
RETURNS TABLE
AS
  RETURN
  ( 
    SELECT 
      Val1 = PARSENAME(Value,2),
      Val2 = PARSENAME(Value,1)
    FROM 
    (
      SELECT REPLACE(Value, @Delim2, '.') FROM
      ( 
        SELECT LTRIM(RTRIM(SUBSTRING(@List, [Number],
          CHARINDEX(@Delim1, @List + @Delim1, [Number]) - [Number])))
        FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
          FROM sys.all_objects) AS x
          WHERE Number <= LEN(@List)
          AND SUBSTRING(@Delim1 + @List, [Number], LEN(@Delim1)) = @Delim1
       ) AS y(Value)
     ) AS z(Value)
   );
GO

Sample usage:

DECLARE @x TABLE(ID INT, string VARCHAR(255));

INSERT @x VALUES
  (1, 'LABEL1\VALUE LABEL2\VALUE LABEL3\VALUE'),
  (2, 'LABEL1\VALUE2 LABEL2\VALUE2');

SELECT x.ID, t.val1, t.val2
FROM @x AS x CROSS APPLY 
 dbo.SplitGriswold(REPLACE(x.string, ' ', N'ŏ'), N'ŏ', '\') AS t;

(I used a Unicode character unlikely to appear in data above, only because a space can be problematic for things like length checks. If this character is likely to appear, choose a different one.)

Results:

ID   val1       val2
--   --------   --------
1    LABEL1     VALUE
1    LABEL2     VALUE
1    LABEL3     VALUE
2    LABEL1     VALUE2
2    LABEL2     VALUE2

If your data might have ., then you can just make the query a little more complex, without changing the function, by adding yet another character to the mix that is unlikely or impossible to be in the data:

DECLARE @x TABLE(ID INT, string VARCHAR(255));

INSERT @x VALUES
(1, 'LABEL1\VALUE.A LABEL2\VALUE.B LABEL3\VALUE.C'),
(2, 'LABEL1\VALUE2.A LABEL2.1\VALUE2.B');

SELECT x.ID, val1 = REPLACE(t.val1, N'ű', '.'), val2 = REPLACE(t.val2, N'ű', '.')
FROM @x AS x CROSS APPLY 
  dbo.SplitGriswold(REPLACE(REPLACE(x.string, ' ', 'ŏ'), '.', N'ű'), 'ŏ', '\') AS t;

Results:

ID   val1       val2
--   --------   --------
1    LABEL1     VALUE.A
1    LABEL2     VALUE.B
1    LABEL3     VALUE.C
2    LABEL1     VALUE2.A
2    LABEL2.1   VALUE2.B


回答2:

With only three values, you can manage to do this by brute force:

select (case when rest like '% %'
             then left(rest, charindex(' ', rest) - 1)
             else rest
        end) as val2,
       (case when rest like '% %'
             then substring(col, charindex(' ', col) + 1, 1000)
        end) as val3
from (select (case when col like '% %'
                   then left(col, charindex(' ', col) - 1)
                   else col
              end) as val1,
             (case when col like '% %'
                   then substring(col, charindex(' ', col) + 1, 1000)
              end) as rest
      from t
     ) t


回答3:

Using the SQL split string function given at referenced SQL tutorial, you can split the label-value pairs as following

SELECT
id, max(label) as label, max(value) as value
FROM (
SELECT 
    s.id, 
    label = case when t.id = 1 then t.val else NULL end,
    value = case when t.id = 2 then t.val else NULL end
FROM dbo.Split(N'LABEL1\VALUE1 LABEL2\VALUE2 LABEL3\VALUE3', ' ') s
CROSS APPLY dbo.Split(s.val, '\') t
) t
group by id

You can see that the split string function is called twice, first for splitting pairs from others. Then the second split function joined to previous one using CROSS APPLY splits labels from pairs