Perform regex (replace) in an SQL query

2019-01-08 23:32发布

问题:

What is the best way to replace all '&lt' with < in a given database column? Basically perform s/&lt[^;]/</gi

Notes:

  • must work in MS SQL Server 2000
  • Must be repeatable (and not end up with <;;;;;;;;;)

回答1:

Some hacking required but we can do this with LIKE, PATINDEX, LEFT AND RIGHT and good old string concatenation.

create table test
(
    id int identity(1, 1) not null,
    val varchar(25) not null
)

insert into test values ('&lt; <- ok, &lt <- nok')

while 1 = 1
begin
    update test
        set val = left(val, patindex('%&lt[^;]%', val) - 1) +
                      '&lt;' +
                      right(val, len(val) - patindex('%&lt[^;]%', val) - 2)
    from test
    where val like '%&lt[^;]%'

    IF @@ROWCOUNT = 0 BREAK
end

select * from test

Better is that this is SQL Server version agnostic and should work just fine.



回答2:

I think this can be done much cleaner if you use different STUFF :)

create table test
(
    id int identity(1, 1) not null,
    val varchar(25) not null
)

insert into test values ('&lt; <- ok, &lt <- nok')

WHILE 1 = 1
BEGIN
    UPDATE test SET
        val = STUFF( val , PATINDEX('%&lt[^;]%', val) + 3 , 0 , ';' )
    FROM test
    WHERE val LIKE '%&lt[^;]%'

    IF @@ROWCOUNT = 0 BREAK
END

select * from test


回答3:

How about:

    UPDATE tableName
    SET columName = REPLACE(columName , '&lt', '&lt;')
    WHERE columnName LIKE '%lt%'
    AND columnName NOT LIKE '%lt;%'

Edit:

I just realized this will ignore columns with partially correct &lt; strings.

In that case you can ignore the second part of the where clause and call this afterward:

    UPDATE tableName
    SET columName = REPLACE(columName , '&lt;;', '&lt;')


回答4:

This article covers how to create a simple Regex Replace function that you can use in SQL 2000 (and 2005 with simple tweak) that can assist you.



回答5:

If MSSQL's regex flavor supports negative lookahead, that would be The Right Way to approach this.

s/&lt(?!;)/&lt;/gi

will catch all instances of &lt which are not followed by a ; (even if they're followed by nothing, which [^;] would miss) and does not capture the following non-; character as part of the match, eliminating the issue mentioned in the comments on the original question of that character being lost in the replacement.

Unfortunately, I don't use MSSQL, so I have no idea whether it supports negative lookahead or not...



回答6:

Very specific to this pattern, but I have done similar to this in the past:

REPLACE(REPLACE(columName, '&lt;', '&lt'), '&lt', '&lt;')

broader example (encode characters which may be inappropriate in a TITLE attribute)

REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    columName
    -- Remove existing encoding:
    , '&amp;', '&')
    , '&#34;', '"')
    , '&#39;', '''')
    -- Reinstate/Encode:
    , '&', '&amp;')
    -- Encode:
    , '"', '&#34;')
    , '''', '&#39;')
    , ' ', '%20')
    , '<', '%3C')
    , '>', '%3E')
    , '/', '%2F')
    , '\', '%5C')