parsing a sql string for integer values with multi

2019-07-14 15:54发布

I found some related examples here on SO, but not one that for this case specifically. Here goes:

I need to use SQL to parse data sourced from a flat file that is pipe delimited. One of the fields is sub-formatted as follows. My end state is to sum the integers within the field, but my question here is to see of ways to use a SQL SELECT to parse/extract JUST the integers. The pattern of the sub-formatting will always be where the desired integers will be preceded by a tilde (~) and followed by an asterisk (*). The number of sub fields may vary too (my example has 5, but there could more or less). The 4 char TAG name is of no importance.

So here is a sample:

|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|

From this example, all I would want for processing is the final number of 219. Again, I can work on the sum part as a further step; just interested in getting the numbers.

I know how to handle this quite easily with UNIX tools (ie AWK, sed, etc), but not in SQL. Any suggestions will help! Thanks!

4条回答
太酷不给撩
2楼-- · 2019-07-14 16:25

Is it this what you want (it calculates the sum 219)? Sorry for the ugly formatting, but for sure you can make it "pretty" by yourself.

declare @Input varchar(255),
        @InputModified varchar(255)

set @Input = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|'


 SELECT 
@InputModified = 
 (
     CAST(CAST((
            SELECT SUBSTRING(@Input, Number, 1)
            FROM master..spt_values
            WHERE Type='p' AND Number <= LEN(@Input) AND
            (
                SUBSTRING(@Input, Number, 1) LIKE '[0-9]' 
                or
                SUBSTRING(@Input, Number, 1) LIKE '~'
                or
                SUBSTRING(@Input, Number, 1) LIKE '.'
            )
            FOR XML Path(''))
        AS xml) AS varchar(MAX))
)


select sum(cast(Integers as float))
from
(
    SELECT 
    LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))) AS Integers
    FROM
    (
        SELECT  CAST('<XMLRoot><RowData>' + REPLACE(@InputModified,'~','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
    )t
    CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)
) Integers
where IsNumeric(Integers.Integers) = 1
查看更多
做个烂人
3楼-- · 2019-07-14 16:36

This will come back with 219:

  • In Cut1 I let the string start with the first number.
  • In Cut2 I cut the final pipe and add an asterisk
  • In Splitted I replace the ~ with XML tags thus making it easy to split the string
  • In the final Select the pure numbers are summed up...

Now the code

 DECLARE @str VARCHAR(100) = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|';

 WITH Cut1 AS
(
    SELECT SUBSTRING(@str,CHARINDEX('~',@str,1)+1,1000) AS c1
)
,Cut2 AS
(
    SELECT SUBSTRING(Cut1.c1,1,LEN(Cut1.c1)-1) + '*' AS c2 FROM Cut1
)
,Splitted AS
(
    SELECT CAST('<x>' + REPLACE(Cut2.c2,'~','</x><x>') + '</x>' AS XML) AS AsXML FROM Cut2
)
SELECT SUM(CAST(SUBSTRING(numbers.value('.','varchar(100)'),1,CHARINDEX('*',numbers.value('.','varchar(100)'),1)-1) AS FLOAT))
FROM Splitted
CROSS APPLY AsXML.nodes('/x') AS the(numbers)
查看更多
smile是对你的礼貌
4楼-- · 2019-07-14 16:41

additional approach using recursive cte:

DECLARE @Input VARCHAR(255) 
SET @Input = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|'
;WITH cte ( n )
       AS ( SELECT   n = 1
            UNION ALL
            SELECT   n + 1
            FROM     cte
            WHERE    n <= LEN(@Input)
           )

 SELECT  SUM(CONVERT(FLOAT, val))
 FROM   (SELECT SUBSTRING(STUFF(@Input, 1, t.n, ''), 1,
                                PATINDEX('%[*|]%', STUFF(@Input, 1, t.n, '')) - 1) AS Val
         FROM cte AS t
         WHERE SUBSTRING(@Input, t.n, 
                         PATINDEX('%[*|]%', SUBSTRING(@Input, t.n, 99))) LIKE '~%[*|]'
         ) AS T

test is HERE

查看更多
手持菜刀,她持情操
5楼-- · 2019-07-14 16:42

I have tried to be different.Also I am not following any delimiter. So not following any delimiter may be strength of this technique.

I just extract the first number from string then delete string from 0 till that number and so on.

So it is very clear that we can do it using recursive CTE,while loop and using row_number.

Here I have tried using Recursive CTE which may not be optimize,but sometime it is fun to be different.

DECLARE @Input VARCHAR(255) 
SET @Input = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|';

This line do the trick for me,

select Left(SubString(@Input, PatIndex('%[0-9.]%', @Input),8000),
PatIndex('%[^0-9.]%', SubString(@Input, 
PatIndex('%[0-9.]%', @Input), 8000) )-1) 

Complete Script,

;With CTE as
(
Select @Input Input, Left(SubString(@Input, PatIndex('%[0-9.]%', @Input), 8000), PatIndex('%[^0-9.]%', SubString(@Input, PatIndex('%[0-9.]%', @Input), 8000) )-1) XX

union all

select cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255))
,Left(SubString(cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255)), PatIndex('%[0-9.]%', cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255))), 8000), PatIndex('%[^0-9.]%', SubString(cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255)), PatIndex('%[0-9.]%', cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255))), 8000) )-1)
from cte where len(input)>0

)

select sum(cast(XX as float)) from cte
where xx is not null or xx<>''
查看更多
登录 后发表回答