How to use Regular Expression in sql server?

2020-02-05 09:33发布

问题:

Is it possible to make efficient queries that use regular expression feature set.I got data in my table which is not in correct format,EX:-In Title colum: Cable 180┬░ To 90┬░ Serial ATA Cable and in Id column 123234+ data in exponential format,it is possible to make queries using regular expression in Sqlserver2008.

回答1:

You need to make use of the following. Usually requires combinations of the three:

  1. patindex
  2. charindex
  3. substring

In response to your comment above, patindex should not 0 where the case is found. patindex finds the start location of the pattern specified, so if patindex finds the case, it should return an integer > 0.

EDIT:

Also, len(string) and reverse(string) come in handy on specific occasions.



回答2:

With the CLR and .NET project published to SQL Server it is EXTREMELY efficient. After starting to use a CLR Project in VB.Net with our 2005 SQL Server over the past 2 years I have found that every occurance of a Scalar Function in TSQL for which I have replaced with the .NET version it have dramatically improved performance times. I have used it for advanced date manipulation, formatting and parsing, String formatting and parsing, MD5 Hash generation, Vector lengths, String JOIN Aggragate function, Split Table Valued function, and even bulk loading from serialized datatables via a share folder (which is amazingly fast).

For RegEx since it is not already present I can only assume it is as efficient as a compiled EXE would be doing the same REGEX, which is to say extremely fast.

I will share a code file from my VB.Net CLR project that allows some RegEx functionality. This code would be part of a .NET CLR DLL that is published to your server.

Function Summary

Regex_IsMatch(Data,Parttern,Options) AS tinyint (0/1 result)

Eg. SELECT dbo.Regex_IsMatch('Darren','[trwq]en$',NULL) -- returns 1 / true

Regex_Group(data,pattern,groupname,options) as nvarchar(max) (capture group value returned)

Eg. SELECT dbo.Regex_Group('Cable 180+e10 to 120+e3',' (?[0-9]+)+e[0-9]+','n',NULL) -- returns '180'

Regex_Replace(data,pattern,replacement,options) as nvarchar(max) (returns modified string)

Eg. SELECT dbo.Regex_Replace('Cable 180+e10 to 120+e3',' (?[0-9]+)+e(?[0-9]+)',' ${e}:${n]',NULL) -- returns 'Cable 10:180 to 3:120'

Partial Public Class UserDefinedFunctions

    ''' <summary>
    ''' Returns 1 (true) or 0 (false) if a pattern passed is matched in the data passed.
    ''' Returns NULL if Data is NULL.
    ''' options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"
    ''' </summary>
    ''' <param name="data"></param>
    ''' <param name="pattern"></param>
    ''' <param name="options">options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"</param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    <Microsoft.SqlServer.Server.SqlFunction()> _
    Public Shared Function Regex_IsMatch(data As SqlChars, pattern As SqlChars, options As SqlString) As SqlByte
        If pattern.IsNull Then
            Throw New Exception("Pattern Parameter in ""RegEx_IsMatch"" cannot be NULL")
        End If
        If data.IsNull Then
            Return SqlByte.Null
        Else
            Return CByte(If(Regex.IsMatch(data.Value, pattern.Value, Regex_Options(options)), 1, 0))
        End If
    End Function

    ''' <summary>
    ''' Returns the Value of a RegularExpression Pattern Group by Name or Number.
    ''' Group needs to be captured explicitly. Example Pattern "[a-z](?&lt;m&gt;[0-9][0-9][0-9][0-9])" to capture the numeric portion of an engeneering number by the group called "m".
    ''' Returns NULL if The Capture was not successful.
    ''' Returns NULL if Data is NULL.
    ''' options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"
    ''' </summary>
    ''' <param name="data"></param>
    ''' <param name="pattern"></param>
    ''' <param name="groupName">Name used in the explicit capture group</param>
    ''' <param name="options">options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"</param>
    <Microsoft.SqlServer.Server.SqlFunction()> _
    Public Shared Function Regex_Group(data As SqlChars, pattern As SqlChars, groupName As SqlString, options As SqlString) As SqlChars
        If pattern.IsNull Then
            Throw New Exception("Pattern Parameter in ""RegEx_IsMatch"" cannot be NULL")
        End If
        If groupName.IsNull Then
            Throw New Exception("GroupName Parameter in ""RegEx_IsMatch"" cannot be NULL")
        End If
        If data.IsNull Then
            Return SqlChars.Null
        Else
            Dim m As Match = Regex.Match(data.Value, pattern.Value, Regex_Options(options))
            If m.Success Then
                Dim g As Group
                If IsNumeric(groupName.Value) Then
                    g = m.Groups(CInt(groupName.Value))
                Else
                    g = m.Groups(groupName.Value)
                End If
                If g.Success Then
                    Return New SqlChars(g.Value)
                Else ' group did not return or was not found.
                    Return SqlChars.Null
                End If
            Else 'match failed.
                Return SqlChars.Null
            End If
        End If
    End Function

    ''' <summary>
    ''' Does the Equivalent toi Regex.Replace in .NET.
    ''' Replacement String Replacement Markers are done in this format "${test}" = Replaces the capturing group (?&lt;test&gt;...)
    ''' If the replacement pattern is $1 or $2 then it replaces the first or second captured group by position.
    ''' Returns NULL if Data is NULL.
    ''' options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"
    ''' </summary>
    ''' <param name="data"></param>
    ''' <param name="pattern"></param>
    ''' <param name="replacement">Replacement String Replacement Markers are done in this format "${test}" = Replaces the capturing group (?&lt;test&gt;...). If the replacement pattern is $1 or $2 then it replaces the first or second captured group by position.</param>
    ''' <param name="options">options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"</param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    <SqlFunction()> _
    Public Shared Function Regex_Replace(data As SqlChars, pattern As SqlChars, replacement As SqlChars, options As SqlString) As SqlChars
        If pattern.IsNull Then
            Throw New Exception("Pattern Parameter in ""Regex_Replace"" cannot be NULL")
        End If
        If replacement.IsNull Then
            Throw New Exception("Replacement Parameter in ""Regex_Replace"" cannot be NULL")
        End If
        If data.IsNull Then
            Return SqlChars.Null
        Else
            Return New SqlChars(Regex.Replace(data.Value, pattern.Value, replacement.Value, Regex_Options(options)))
        End If
    End Function

    ''' <summary>
    ''' Buffered list of options by name for speed.
    ''' </summary>
    Private Shared m_Regex_Buffered_Options As New Generic.Dictionary(Of String, RegexOptions)(StrComp)
    ''' <summary>
    ''' Default regex options used when options value is NULL or an Empty String
    ''' </summary>
    Private Shared ReadOnly m_Regex_DefaultOptions As RegexOptions = RegexOptions.IgnoreCase Or RegexOptions.ExplicitCapture Or RegexOptions.Multiline

    ''' <summary>
    ''' Get the regular expressions options to use by a passed string of data.
    ''' Formatted like command line arguments.
    ''' </summary>
    ''' <param name="options">options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline "</param>
    Private Shared Function Regex_Options(options As SqlString) As RegexOptions
        Return Regex_Options(If(options.IsNull, "", options.Value))
    End Function

    ''' <summary>
    ''' Get the regular expressions options to use by a passed string of data.
    ''' Formatted like command line arguments.
    ''' </summary>
    ''' <param name="options">options example, full or partial names can be used after slashes or hypens with or without spaces, some are exclusive of each other "/ic /ex -s" = "\ignorecase -explicitcapture/singleline"</param>
    Private Shared Function Regex_Options(options As String) As RegexOptions
        'empty options string is considered default options.
        If options Is Nothing OrElse options = "" Then
            Return m_Regex_DefaultOptions
        Else
            Dim out As RegexOptions
            If m_Regex_Buffered_Options.TryGetValue(options, out) Then
                Return out
            Else
                'must build options and store them
                If options Like "*[/\-]n*" Then
                    out = RegexOptions.None
                End If
                If options Like "*[/\-]s*" Then
                    out = out Or RegexOptions.Singleline
                End If
                If options Like "*[/\-]m*" Then
                    out = out Or RegexOptions.Multiline
                End If
                If options Like "*[/\-]co*" Then
                    out = out Or RegexOptions.Compiled
                End If
                If options Like "*[/\-]c[ui]*" Then
                    out = out Or RegexOptions.CultureInvariant
                End If
                If options Like "*[/\-]ecma*" Then
                    out = out Or RegexOptions.ECMAScript
                End If
                If options Like "*[/\-]e[xc]*" Then
                    out = out Or RegexOptions.ExplicitCapture
                End If
                If options Like "*[/\-]i[c]*" OrElse options Like "*[/\-]ignorec*" Then
                    out = out Or RegexOptions.IgnoreCase
                End If
                If options Like "*[/\-]i[pw]*" OrElse options Like "*[/\-]ignore[pw]*" Then
                    out = out Or RegexOptions.IgnorePatternWhitespace
                End If
                If options Like "*[/\-]r[tl]*" Then
                    out = out Or RegexOptions.RightToLeft
                End If
                'store the options for next call (for speed)
                m_Regex_Buffered_Options(options) = out
                Return out
            End If
        End If
    End Function

End Class
share|improve this answer