Regex pattern inside SQL Replace function?

2018-12-31 20:38发布

SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', '');

I want to replace any markup between two parts of the number with above regex, but it does not seem to work. I'm not sure if it is regex syntax that's wrong because I tried simpler one such as '%[^0-9]%' just to test but it didn't work either. Does anyone know how can I achieve this?

8条回答
春风洒进眼中
2楼-- · 2018-12-31 21:09

Wrapping the solution inside a SQL function could be useful if you want to reuse it. I'm even doing it at the cell level, that's why I'm putting this as a different answer:

CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (@string VARCHAR(300))
RETURNS VARCHAR(300)
BEGIN
    DECLARE @str VARCHAR(300) = @string;
    DECLARE @Pattern VARCHAR (20) = '%[^a-zA-Z0-9]%';
    DECLARE @Len INT;
    SELECT @Len = LEN(@String); 
    WHILE @Len > 0 
    BEGIN
        SET @Len = @Len - 1;
        IF (PATINDEX(@Pattern,@str) > 0)
            BEGIN
                SELECT @str = STUFF(@str, PATINDEX(@Pattern,@str),1,'');    
            END
        ELSE
        BEGIN
            BREAK;
        END
    END     
    RETURN @str
END
查看更多
大哥的爱人
3楼-- · 2018-12-31 21:18

I stumbled across this post looking for something else but thought I'd mention a solution I use which is far more efficient - and really should be the default implementation of any function when used with a set based query - which is to use a cross applied table function. Seems the topic is still active so hopefully this is useful to someone.

Example runtime on a few of the answers so far based on running recursive set based queries or scalar function, based on 1m rows test set removing the chars from a random newid, ranges from 34s to 2m05s for the WHILE loop examples and from 1m3s to {forever} for the function examples.

Using a table function with cross apply achieves the same goal in 10s. You may need to adjust it to suit your needs such as the max length it handles.

Function:

CREATE FUNCTION [dbo].[RemoveChars](@InputUnit VARCHAR(40))
RETURNS TABLE
AS
RETURN
    (
        WITH Numbers_prep(Number) AS
            (
                SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
            )
        ,Numbers(Number) AS
            (
                SELECT TOP (ISNULL(LEN(@InputUnit),0))
                    row_number() OVER (ORDER BY (SELECT NULL))
                FROM Numbers_prep a
                    CROSS JOIN Numbers_prep b
            )
        SELECT
            OutputUnit
        FROM
            (
                SELECT
                    substring(@InputUnit,Number,1)
                FROM  Numbers
                WHERE substring(@InputUnit,Number,1) like '%[0-9]%'
                ORDER BY Number
                FOR XML PATH('')
            ) Sub(OutputUnit)
    )

Usage:

UPDATE t
SET column = o.OutputUnit
FROM ##t t
CROSS APPLY [dbo].[RemoveChars](t.column) o
查看更多
听够珍惜
4楼-- · 2018-12-31 21:21

You can use PATINDEX to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.

Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.

DECLARE @counter int

SET @counter = 0

WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN  

    WHILE 1 = 1
    BEGIN
        DECLARE @RetVal varchar(50)

        SET @RetVal =  (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
        FROM Table
        WHERE ID_COLUMN = @counter)

        IF(@RetVal IS NOT NULL)       
          UPDATE Table SET
          Column = @RetVal
          WHERE ID_COLUMN = @counter
        ELSE
            break
    END

    SET @counter = @counter + 1
END

Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.

Credit goes to this StackOverFlow answer.

EDIT Credit also goes to @srutzky

Edit (by @Tmdean) Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.

WHILE 1 = 1 BEGIN
    WITH q AS
        (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
        FROM Table)
    UPDATE Table
    SET Column = STUFF(Column, q.n, 1, '')
    FROM q
    WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

    IF @@ROWCOUNT = 0 BREAK;
END;

You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)

DECLARE @done bit = 0;
WHILE @done = 0 BEGIN
    WITH q AS
        (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
        FROM Table
        WHERE COALESCE(Scrubbed_Column, 0) = 0)
    UPDATE Table
    SET Column = STUFF(Column, q.n, 1, ''),
        Scrubbed_Column = 0
    FROM q
    WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

    IF @@ROWCOUNT = 0 SET @done = 1;

    -- if Scrubbed_Column is still NULL, then the PATINDEX
    -- must have given 0
    UPDATE table
    SET Scrubbed_Column = CASE
        WHEN Scrubbed_Column IS NULL THEN 1
        ELSE NULLIF(Scrubbed_Column, 0)
    END;
END;

If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.

查看更多
栀子花@的思念
5楼-- · 2018-12-31 21:21

If you are doing this just for a parameter coming into a Stored Procedure, you can use the following:

while PatIndex('%[^0-9]%', @Param) > 0
    select  @Param = Replace(@Param, Substring(@Param, PatIndex('%[^0-9]%', @Param), 1), '')
查看更多
荒废的爱情
6楼-- · 2018-12-31 21:22

I think a simpler and faster approach is iterate by each character of the alphabet:

DECLARE @i int
SET @i = 0

WHILE(@i < 256)
BEGIN  

    IF char(@i) NOT IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.')      

      UPDATE Table SET Column = replace(Column, char(@i), '')

    SET @i = @i + 1

END
查看更多
梦该遗忘
7楼-- · 2018-12-31 21:23

Instead of stripping out the found character by its sole position, using Replace(Column, BadFoundCharacter, '') could be substantially faster. Additionally, instead of just replacing the one bad character found next in each column, this replaces all those found.

WHILE 1 = 1 BEGIN
    UPDATE dbo.YourTable
    SET Column = Replace(Column, Substring(Column, PatIndex('%[^0-9.-]%', Column), 1), '')
    WHERE Column LIKE '%[^0-9.-]%'
    If @@RowCount = 0 BREAK;
END;

I am convinced this will work better than the accepted answer, if only because it does fewer operations. There are other ways that might also be faster, but I don't have time to explore those right now.

查看更多
登录 后发表回答