Need Better Solution to remove special characters

2019-09-17 16:41发布

问题:

Is there any better way to remove all the special characters and numbers in a column, not limited to one or two almost to remove all the special characters and numbers. As of now I I am using this query (Jst Logic part). Let me know is there any better choice as my table contains minimum 5 million records

Declare @name varchar(1000) = '231323Lig%$%$h$%t'
Declare @dumy varchar(1000)

while(PATINDEX('%[0-9]%',@name)<>0)
    SET @name   = stuff(@name,PATINDEX('%[0-9]%',@name),1,'')


while(PATINDEX('%[A-Z]%',@name)<>0)
begin
    SET @dumy = isnull(@dumy,'')+substring(@name,PATINDEX('%[A-Z]%',@name),1)
    SET @name = stuff(@name,PATINDEX('%[A-Z]%',@name),1,'')
end

Set @name = @dumy

Select @name 'Clean'

回答1:

Try this function:

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

Call it like this:

Select dbo.RemoveNonAlphaCharacters('231323Lig%$%$h$%t')

Output:


Clean
Light

Update

If you want to do it for entire table just try like this

Select dbo.RemoveNonAlphaCharacters(ColumnName),OtherColumn1,OtherColumn2 
FROM Table1


回答2:

You can simplify your function to one WHILE loop:

DECLARE @String NVARCHAR(MAX) = '231323Lig%$%$h$%t'
DECLARE @Expression NVARCHAR(32) = '%[^A-Z]%'

WHILE PATINDEX(@Expression, @String) > 0
    SET @String = STUFF(@String, PATINDEX(@Expression, @String), 1, '')

RETURN @String

A CLR function could be faster than the pure T-SQL implementation.

Regex.Replace(str, "[^a-zA-Z]+", "", RegexOptions.Compiled)