I found some related examples here on SO, but not one that for this case specifically. Here goes:
I need to use SQL to parse data sourced from a flat file that is pipe delimited. One of the fields is sub-formatted as follows. My end state is to sum the integers within the field, but my question here is to see of ways to use a SQL SELECT to parse/extract JUST the integers. The pattern of the sub-formatting will always be where the desired integers will be preceded by a tilde (~) and followed by an asterisk (*). The number of sub fields may vary too (my example has 5, but there could more or less). The 4 char TAG name is of no importance.
So here is a sample:
|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|
From this example, all I would want for processing is the final number of 219. Again, I can work on the sum part as a further step; just interested in getting the numbers.
I know how to handle this quite easily with UNIX tools (ie AWK, sed, etc), but not in SQL. Any suggestions will help! Thanks!
This will come back with 219:
- In Cut1 I let the string start with the first number.
- In Cut2 I cut the final pipe and add an asterisk
- In Splitted I replace the ~ with XML tags thus making it easy to split the string
- In the final Select the pure numbers are summed up...
Now the code
DECLARE @str VARCHAR(100) = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|';
WITH Cut1 AS
(
SELECT SUBSTRING(@str,CHARINDEX('~',@str,1)+1,1000) AS c1
)
,Cut2 AS
(
SELECT SUBSTRING(Cut1.c1,1,LEN(Cut1.c1)-1) + '*' AS c2 FROM Cut1
)
,Splitted AS
(
SELECT CAST('<x>' + REPLACE(Cut2.c2,'~','</x><x>') + '</x>' AS XML) AS AsXML FROM Cut2
)
SELECT SUM(CAST(SUBSTRING(numbers.value('.','varchar(100)'),1,CHARINDEX('*',numbers.value('.','varchar(100)'),1)-1) AS FLOAT))
FROM Splitted
CROSS APPLY AsXML.nodes('/x') AS the(numbers)
Is it this what you want (it calculates the sum 219)? Sorry for the ugly formatting, but for sure you can make it "pretty" by yourself.
declare @Input varchar(255),
@InputModified varchar(255)
set @Input = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|'
SELECT
@InputModified =
(
CAST(CAST((
SELECT SUBSTRING(@Input, Number, 1)
FROM master..spt_values
WHERE Type='p' AND Number <= LEN(@Input) AND
(
SUBSTRING(@Input, Number, 1) LIKE '[0-9]'
or
SUBSTRING(@Input, Number, 1) LIKE '~'
or
SUBSTRING(@Input, Number, 1) LIKE '.'
)
FOR XML Path(''))
AS xml) AS varchar(MAX))
)
select sum(cast(Integers as float))
from
(
SELECT
LTRIM(RTRIM(m.n.value('.[1]','varchar(8000)'))) AS Integers
FROM
(
SELECT CAST('<XMLRoot><RowData>' + REPLACE(@InputModified,'~','</RowData><RowData>') + '</RowData></XMLRoot>' AS XML) AS x
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)
) Integers
where IsNumeric(Integers.Integers) = 1
additional approach using recursive cte:
DECLARE @Input VARCHAR(255)
SET @Input = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|'
;WITH cte ( n )
AS ( SELECT n = 1
UNION ALL
SELECT n + 1
FROM cte
WHERE n <= LEN(@Input)
)
SELECT SUM(CONVERT(FLOAT, val))
FROM (SELECT SUBSTRING(STUFF(@Input, 1, t.n, ''), 1,
PATINDEX('%[*|]%', STUFF(@Input, 1, t.n, '')) - 1) AS Val
FROM cte AS t
WHERE SUBSTRING(@Input, t.n,
PATINDEX('%[*|]%', SUBSTRING(@Input, t.n, 99))) LIKE '~%[*|]'
) AS T
test is HERE
I have tried to be different.Also I am not following any delimiter.
So not following any delimiter may be strength of this technique.
I just extract the first number from string then delete string from 0 till that number and so on.
So it is very clear that we can do it using recursive CTE,while loop and using row_number.
Here I have tried using Recursive CTE which may not be optimize,but sometime it is fun to be different.
DECLARE @Input VARCHAR(255)
SET @Input = '|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|';
This line do the trick for me,
select Left(SubString(@Input, PatIndex('%[0-9.]%', @Input),8000),
PatIndex('%[^0-9.]%', SubString(@Input,
PatIndex('%[0-9.]%', @Input), 8000) )-1)
Complete Script,
;With CTE as
(
Select @Input Input, Left(SubString(@Input, PatIndex('%[0-9.]%', @Input), 8000), PatIndex('%[^0-9.]%', SubString(@Input, PatIndex('%[0-9.]%', @Input), 8000) )-1) XX
union all
select cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255))
,Left(SubString(cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255)), PatIndex('%[0-9.]%', cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255))), 8000), PatIndex('%[^0-9.]%', SubString(cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255)), PatIndex('%[0-9.]%', cast(stuff(Input,1, charindex(XX,Input)+len(XX)-1,'') as varchar(255))), 8000) )-1)
from cte where len(input)>0
)
select sum(cast(XX as float)) from cte
where xx is not null or xx<>''