可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Take the following example data:
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
What i would like to do is extract this data as follows:
As you can see, i need to extract the Ref, Type & OurRef as seperate columns to ensure efficient set based SQL when processing the resulting emails.
Usually for this scenario i would use a function such as this:
CREATE FUNCTION dbo.fnParseString (
@Section SMALLINT ,
@Delimiter CHAR ,
@Text VARCHAR(MAX)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @NextPos SMALLINT;
DECLARE @LastPos SMALLINT;
DECLARE @Found SMALLINT;
SELECT @NextPos = CHARINDEX(@Delimiter, @Text, 1) ,
@LastPos = 0 ,
@Found = 1
WHILE @NextPos > 0
AND ABS(@Section) <> @Found
SELECT @LastPos = @NextPos ,
@NextPos = CHARINDEX(@Delimiter, @Text, @NextPos + 1) ,
@Found = @Found + 1
RETURN LTRIM(RTRIM(CASE
WHEN @Found <> ABS(@Section) OR @Section = 0 THEN NULL
WHEN @Section > 0 THEN SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
ELSE SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
END))
END
For example i then replace the white space before the ref to include a comma and split as follows:
WITH ExampleData
AS ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113'
)
SELECT dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,
dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,
dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]
FROM ExampleData
As you can see, i have a solution that gets the end-result i'm after, but the use of a messy udf isnt ideal & i was wondering if sql-server has a better way of doing stuff like this - perhaps inline regular expressions? I.e. i think PATINDEX()
accepts regular expressions as a search string - this in conjunction with SUBSTRING()
could do what i need but i dont really know where to start?
Edit: Please note that this is a simplified example, the subject is variable and i'll also be adapting the same technique to parse the body, the body will have 8 items of data that i need to parse out using a variety of delimiters, so this rules out the use of ParseName()
as it only allows 4 parts, and i can’t use fixed length (i.e. substring()
) as the length will be very varied (especially if different helpdesks are involved (which they are) - this is why i was thinking along the lines of PATINDEX()
& SUBSTRING()
回答1:
I suggest to use this:
;WITH CTE
AS
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
)
, CTEPart
as
(
SELECT [Subject], REPLACE(SUBSTRING([Subject], 25, 1000), ', ', '.') Part
FROM CTE
)
SELECT
[Subject],
PARSENAME(Part, 1) AS [Ref],
PARSENAME(Part, 2) AS [Type],
PARSENAME(Part, 3) AS [OurRef]
FROM CTEPart
回答2:
After additional work we decided not to use the approach in Art's answer (even though it worked).
We needed a far more powerfull way of validating and extracting the substrings, So I went the Regular Expression via CLR route (Thanks to Pondlife for pointing me in the right direction).
The approach i took was as follows:
First I compiled the following CLR: (Converted to VB from a C# example Here)
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Imports System.Text
Partial Public Class UserDefinedFunctions
Public Shared ReadOnly Options As RegexOptions = RegexOptions.IgnorePatternWhitespace Or RegexOptions.Multiline
<SqlFunction()> _
Public Shared Function RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
Dim regex As New Regex(pattern.Value, Options)
Return regex.IsMatch(New String(input.Value))
End Function
<SqlFunction()> _
Public Shared Function RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
If expression.IsNull OrElse pattern.IsNull OrElse replace.IsNull Then
Return SqlString.Null
End If
Dim r As New Regex(pattern.ToString())
Return New SqlString(r.Replace(expression.ToString(), replace.ToString()))
End Function
' returns the matching string. Results are separated by 3rd parameter
<SqlFunction()> _
Public Shared Function RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
Dim regex As New Regex(pattern.Value, Options)
Dim results As Match = regex.Match(New String(input.Value))
Dim sb As New StringBuilder()
While results.Success
sb.Append(results.Value)
results = results.NextMatch()
' separate the results with newline|newline
If results.Success Then
sb.Append(matchDelimiter.Value)
End If
End While
Return New SqlString(sb.ToString())
End Function
' returns the matching string
' matchIndex is the zero-based index of the results. 0 for the 1st match, 1, for 2nd match, etc
<SqlFunction()> _
Public Shared Function RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
Dim regex As New Regex(pattern.Value, Options)
Dim results As Match = regex.Match(New String(input.Value))
Dim resultStr As String = ""
Dim index As Integer = 0
While results.Success
If index = matchIndex Then
resultStr = results.Value.ToString()
End If
results = results.NextMatch()
index += 1
End While
Return New SqlString(resultStr)
End Function
End Class
I installed this CLR as follows:
EXEC sp_configure
'clr enabled' ,
'1'
GO
RECONFIGURE
USE [db_Utility]
GO
CREATE ASSEMBLY SQL_CLR_RegExp FROM 'D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\SQL_CLR_RegExp.dll' WITH
PERMISSION_SET = SAFE
GO
-- =============================================
-- Returns 1 or 0 if input matches pattern
-- VB function: RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
-- =============================================
CREATE FUNCTION [dbo].[RegexMatch]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX)
)
RETURNS [bit]
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexMatch]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexReplace]
(
@expression [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@replace [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexReplace]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectAll]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchDelimiter [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectAll]
GO
-- =============================================
-- Returns finding matchIndex of a zero based index
-- RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOne]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchIndex [int]
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectOne]
GO
I then wrote the following wrapping function to simplify use:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Jordon Pilling>
-- Create date: <30/01/2013>
-- Description: <Calls RegexSelectOne with start and end text and cleans the result>
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOneWithScrub]
(
@Haystack VARCHAR(MAX),
@StartNeedle VARCHAR(MAX),
@EndNeedle VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @ReturnStr VARCHAR(MAX)
--#### Extract text from HayStack using Start and End Needles
SET @ReturnStr = dbo.RegexSelectOne(@Haystack, REPLACE(@StartNeedle, ' ','\s') + '((.|\n)+?)' + REPLACE(@EndNeedle, ' ','\s'), 0)
--#### Remove the Needles
SET @ReturnStr = REPLACE(@ReturnStr, @StartNeedle, '')
SET @ReturnStr = REPLACE(@ReturnStr, @EndNeedle, '')
--#### Trim White Space
SET @ReturnStr = LTRIM(RTRIM(@ReturnStr))
--#### Trim Line Breaks and Carriage Returns
SET @ReturnStr = dbo.SuperTrim(@ReturnStr)
RETURN @ReturnStr
END
GO
This allowed usage as follows:
DECLARE @Subject VARCHAR(250) = 'HelpDesk Call Reference F0012345, Call Update, 40111'
DECLARE @Ref VARCHAR(250) = NULL
IF dbo.RegexMatch(@Subject, '^HelpDesk\sCall\sReference\sF[0-9]{7},\s(Call\sResolved|Call\sUpdate|New\scall\slogged),(|\s+)([0-9]+|unknown)$') = 1
SET @Ref = ISNULL(dbo.RegexSelectOneWithScrub(@Subject, 'HelpDesk Call Reference', ','), 'Invalid (#1)')
ELSE
SET @Ref = 'Invalid (#2)'
SELECT @Ref
This was far quicker to use for multiple searches, and far more powerfull when dealing with lots of text with differeent start and end phrases etc.
回答3:
This example is Oracle query. All functions used are ANSI SQL standard which will work in any SQL. This example cuts only REF part of the string. You simply have to repeat all steps for Type, OutRef, etc... This example assumes that your ref will always contain 0-zero, and there will be always ',' after ref, which can be replaced with whitespace or any other character. THe NVL() cna be used: INSTR(str, NVL(',', ' ')...). I think this approach is more generic then hardcoding values into SUBSTR...:
SELECT str, SUBSTR(str, ref_start_pos, ref_end_pos) final_ref
FROM
(
SELECT str, ref_start_pos, INSTR(str, ',', ref_start_pos)-ref_start_pos AS ref_end_pos
FROM
(
SELECT str, INSTR(str, '0')-1 AS ref_start_pos
FROM
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS str
FROM dual
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
FROM dual
)
)
)
/
SQL>
STR | FINAL_REF
------------------------------------------------------------------------
HelpDesk Call Reference F0012345, Call Update, 40111 | F0012345
HelpDesk Call Reference F0012346, Call Resolved, 40112 | F0012346
SQL Server Version (Added by OP):
SELECT [str] ,
SUBSTRING([str], ref_start_pos, ref_end_pos) AS final_ref
FROM ( SELECT [str] ,
ref_start_pos ,
CHARINDEX(',', [str], ref_start_pos) - ref_start_pos AS ref_end_pos
FROM ( SELECT [str] ,
CHARINDEX('Reference', [str]) + 10 AS ref_start_pos
FROM ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [str]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [str]
) AS T1
) AS T2
) AS T3