T-SQL select query to remove non-numeric character

2019-01-02 18:57发布

I've got dirty data in a column with variable alpha length. I just want to strip out anything that is not 0-9.

I do not want to run a function or proc. I have a script that is similar that just grabs the numeric value after text, it looks like this:

Update TableName
set ColumntoUpdate=cast(replace(Columnofdirtydata,'Alpha #','') as int)
where Columnofdirtydata like 'Alpha #%'
And ColumntoUpdate is Null

I thought it would work pretty good until I found that some of the data fields I thought would just be in the format Alpha # 12345789 are not...

Examples of data that needs to be stripped

AB ABCDE # 123
ABCDE# 123
AB: ABC# 123

I just want the 123. It is true that all data fields do have the # prior to the number.

I tried substring and PatIndex, but I'm not quite getting the syntax correct or something. Anyone have any advice on the best way to address this?

Thank you!

13条回答
栀子花@的思念
2楼-- · 2019-01-02 19:00

Here's a version which pulls all digits from a string; i.e. given I'm 35 years old; I was born in 1982. The average family has 2.4 children. this would return 35198224. i.e. it's good where you've got numeric data which may have been formatted as a code (e.g. #123,456,789 / 123-00005), but isn't appropriate if you're looking to pull out specific numbers (i.e. as opposed to digits / just the numeric characters) from the text. Also it only handles digits; so won't return negative signs (-) or periods .).

declare @table table (id bigint not null identity (1,1), data nvarchar(max)) 
insert @table (data) 
values ('hello 123 its 45613 then') --outputs: 12345613
,('1 some other string 98 example 4') --outputs: 1984
,('AB ABCDE # 123') --outputs: 123 
,('ABCDE# 123') --outputs: 123
,('AB: ABC# 123') --outputs: 123
; with NonNumerics as (
    select id
    , data original
    --the below line replaces all digits with blanks
    , replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(data,'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','') nonNumeric
    from @table
)
--each iteration of the below CTE removes another non-numeric character from the original string, putting the result into the numerics column
, Numerics as (
    select id
    , replace(original, substring(nonNumeric,1,1), '') numerics
    , replace(nonNumeric, substring(nonNumeric,1,1), '') charsToreplace
    , len(replace(nonNumeric, substring(nonNumeric,1,1), '')) charsRemaining
    from NonNumerics

    union all

    select id
    , replace(numerics, substring(charsToreplace,1,1), '') numerics
    , replace(charsToreplace, substring(charsToreplace,1,1), '') charsToreplace
    , len(replace(charsToreplace, substring(charsToreplace,1,1), '')) charsRemaining
    from Numerics
    where charsRemaining > 0
)
--we select only those strings with `charsRemaining=0`; i.e. the rows for which all non-numeric characters have been removed; there should be 1 row returned for every 1 row in the original data set.
select * from Numerics where charsRemaining = 0

This code works by removing all the digits (i.e. the characters we want) from a the given strings by replacing them with blanks. Then it goes through the original string (which includes the digits) removing all of the characters that were left (i.e. the non-numeric characters), thus leaving only the digits.

The reason we do this in 2 steps, rather than just removing all non-numeric characters in the first place is there are only 10 digits, whilst there are a huge number of possible characters; so replacing that small list is relatively fast; then gives us a list of those non-numeric characters which actually exist in the string, so we can then replace that small set.

The method makes use of recursive SQL, using common table expressions (CTEs).

查看更多
骚的不知所云
3楼-- · 2019-01-02 19:00

In your case It seems like the # will always be after teh # symbol so using CHARINDEX() with LTRIM() and RTRIM() would probably perform the best. But here is an interesting method of getting rid of ANY non digit. It utilizes a tally table and table of digits to limit which characters are accepted then XML technique to concatenate back to a single string without the non-numeric characters. The neat thing about this technique is it could be expanded to included ANY Allowed characters and strip out anything that is not allowed.

DECLARE @ExampleData AS TABLE (Col VARCHAR(100))
INSERT INTO @ExampleData (Col) VALUES ('AB ABCDE # 123'),('ABCDE# 123'),('AB: ABC# 123')

DECLARE @Digits AS TABLE (D CHAR(1))
INSERT INTO @Digits (D) VALUES ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9')

;WITH cteTally AS (
SELECT
    I = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
    @Digits d10
    CROSS APPLY @Digits d100
    --add more cross applies to cover longer fields this handles 100
)

SELECT *
FROM
    @ExampleData e
    OUTER APPLY (
    SELECT CleansedPhone = CAST((
    SELECT TOP 100
       SUBSTRING(e.Col,t.I,1)
    FROM
       cteTally t
       INNER JOIN @Digits d
       ON SUBSTRING(e.Col,t.I,1) = d.D
    WHERE
       I <= LEN(e.Col)
    ORDER BY
       t.I
    FOR XML PATH('')) AS VARCHAR(100))) o
查看更多
牵手、夕阳
4楼-- · 2019-01-02 19:01
 Declare @MainTable table(id int identity(1,1),TextField varchar(100))
  INSERT INTO @MainTable (TextField)
 VALUES
 ('6B32E')
 declare @i int=1
  Declare @originalWord varchar(100)=''
  WHile @i<=(Select count(*) from @MainTable)
  BEGIN
  Select @originalWord=TextField from @MainTable where id=@i

 Declare @r varchar(max) ='', @len int ,@c char(1), @x int = 0

    Select @len = len(@originalWord)
    declare @pn varchar(100)=@originalWord
    while @x <= @len 
    begin

      Select @c = SUBSTRING(@pn,@x,1)
    if(@c!='')
    BEGIN
            if ISNUMERIC(@c) = 0 and @c <> '-'
    BEGIN
     Select @r = cast(@r as varchar) + cast(replace((SELECT ASCII(@c)-64),'-','') as varchar)

   end
   ELSE
   BEGIN
    Select @r = @r + @c


   END

END


    Select @x = @x +1

    END
    Select @r
  Set @i=@i+1
  END
查看更多
像晚风撩人
5楼-- · 2019-01-02 19:04

In case if there are some characters possible between digits (e.g. thousands separators), you may try following:

declare @table table (DirtyCol varchar(100))
insert into @table values
    ('AB ABCDE # 123')
    ,('ABCDE# 123')
    ,('AB: ABC# 123')
    ,('AB#')
    ,('AB # 1 000 000')
    ,('AB # 1`234`567')
    ,('AB # (9)(876)(543)')

;with tally as (select top (100) N=row_number() over (order by @@spid) from sys.all_columns),
data as (
    select DirtyCol, Col
    from @table
        cross apply (
            select (select C + ''
            from (select N, substring(DirtyCol, N, 1) C from tally where N<=datalength(DirtyCol)) [1]
            where C between '0' and '9'
            order by N
            for xml path(''))
        ) p (Col)
    where p.Col is not NULL
)
select DirtyCol, cast(Col as int) IntCol
from data

Output is:

DirtyCol              IntCol
--------------------- -------
AB ABCDE # 123        123
ABCDE# 123            123
AB: ABC# 123          123
AB # 1 000 000        1000000
AB # 1`234`567        1234567
AB # (9)(876)(543)    9876543

For update, add ColToUpdate to select list of the data cte:

;with num as (...),
data as (
    select ColToUpdate, /*DirtyCol, */Col
    from ...
)
update data
set ColToUpdate = cast(Col as int)
查看更多
浅入江南
6楼-- · 2019-01-02 19:08

To add on to Ken's answer, this handles commas and spaces and parentheses

--Handles parentheses, commas, spaces, hyphens..
declare @table table (c varchar(256))
insert into @table
values
('This is a test 111-222-3344'),
('Some Sample Text (111)-222-3344'),
('Hello there 111222 3344 / How are you?'),
('Hello there 111 222 3344 ? How are you?'),
('Hello there 111 222 3344. How are you?')

select
replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000),
           PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','')
from @table
查看更多
旧时光的记忆
7楼-- · 2019-01-02 19:10

I have created a function for this

Create FUNCTION RemoveCharacters (@text varchar(30))
RETURNS VARCHAR(30)
AS
BEGIN
declare @index as int 
declare @newtexval as varchar(30)
set @index = (select PATINDEX('%[A-Z.-/?]%', @text))
if (@index =0)
begin 
return @text
end
else
begin 
set @newtexval  = (select STUFF ( @text , @index , 1 , '' ))
return dbo.RemoveCharacters(@newtexval)
end
return 0
END
GO
查看更多
登录 后发表回答