看看下面的示例数据:
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
我想这样做的是如下提取这样的数据:
正如你所看到的,我需要提取参考,类型和OurRef作为单独的列,以确保有效一套基于SQL处理所得到的电子邮件时。
通常这种情况下我会用一个函数像这样:
CREATE FUNCTION dbo.fnParseString (
@Section SMALLINT ,
@Delimiter CHAR ,
@Text VARCHAR(MAX)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @NextPos SMALLINT;
DECLARE @LastPos SMALLINT;
DECLARE @Found SMALLINT;
SELECT @NextPos = CHARINDEX(@Delimiter, @Text, 1) ,
@LastPos = 0 ,
@Found = 1
WHILE @NextPos > 0
AND ABS(@Section) <> @Found
SELECT @LastPos = @NextPos ,
@NextPos = CHARINDEX(@Delimiter, @Text, @NextPos + 1) ,
@Found = @Found + 1
RETURN LTRIM(RTRIM(CASE
WHEN @Found <> ABS(@Section) OR @Section = 0 THEN NULL
WHEN @Section > 0 THEN SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
ELSE SUBSTRING(@Text, @LastPos + 1, CASE WHEN @NextPos = 0 THEN DATALENGTH(@Text) - @LastPos ELSE @NextPos - @LastPos - 1 END)
END))
END
例如我然后裁判之前更换白色空间为包括逗号和分割如下:
WITH ExampleData
AS ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113'
)
SELECT dbo.fnParseString(2, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Ref] ,
dbo.fnParseString(3, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [Type] ,
dbo.fnParseString(4, ',', REPLACE([Subject], 'HelpDesk Call Reference ', 'HelpDesk Call Reference, ')) AS [OurRef]
FROM ExampleData
正如你所看到的,我有一个解决方案,获得最终结果后,我敢,而是使用了凌乱的心不是UDF的理想和我想知道如果SQL服务器已经做的东西像这样的一种更好的方式-也许是内置的正表情? 即我认为PATINDEX()
接受的正则表达式搜索字符串-这个会同SUBSTRING()
可以做什么,我需要,但我真的不知道从哪里开始?
编辑:请注意,这是一个简单的例子,主题是可变的,我会也可以适应相同的技术来解析身体,身体就会有8个项目的数据,我需要解析出使用各种分隔符,所以这就排除了使用ParseName()
因为它仅允许4个部分,并且我不能使用固定长度(即substring()
作为长度将是非常不同的(特别是如果不同服务台涉及(其中它们) -这就是为什么我的线沿线的思维PATINDEX()
和SUBSTRING()
Answer 1:
我建议使用:
;WITH CTE
AS
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [Subject]
UNION ALL
SELECT 'HelpDesk Call Reference F0012347, New call logged, 40113' AS [Subject]
)
, CTEPart
as
(
SELECT [Subject], REPLACE(SUBSTRING([Subject], 25, 1000), ', ', '.') Part
FROM CTE
)
SELECT
[Subject],
PARSENAME(Part, 1) AS [Ref],
PARSENAME(Part, 2) AS [Type],
PARSENAME(Part, 3) AS [OurRef]
FROM CTEPart
Answer 2:
经过额外的工作,我们决定不使用艺术的回答方式(尽管它的工作)。
我们需要验证并提取子的一个更强大的方式,于是我就通过CLR路径正则表达式(感谢Pondlife指着我在正确的方向)。
我采取的方法如下:
第一予编译以下CLR:(从C#示例转换为VB 这里 )
Imports System.Data
Imports System.Data.SqlClient
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Text.RegularExpressions
Imports System.Text
Partial Public Class UserDefinedFunctions
Public Shared ReadOnly Options As RegexOptions = RegexOptions.IgnorePatternWhitespace Or RegexOptions.Multiline
<SqlFunction()> _
Public Shared Function RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
Dim regex As New Regex(pattern.Value, Options)
Return regex.IsMatch(New String(input.Value))
End Function
<SqlFunction()> _
Public Shared Function RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
If expression.IsNull OrElse pattern.IsNull OrElse replace.IsNull Then
Return SqlString.Null
End If
Dim r As New Regex(pattern.ToString())
Return New SqlString(r.Replace(expression.ToString(), replace.ToString()))
End Function
' returns the matching string. Results are separated by 3rd parameter
<SqlFunction()> _
Public Shared Function RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
Dim regex As New Regex(pattern.Value, Options)
Dim results As Match = regex.Match(New String(input.Value))
Dim sb As New StringBuilder()
While results.Success
sb.Append(results.Value)
results = results.NextMatch()
' separate the results with newline|newline
If results.Success Then
sb.Append(matchDelimiter.Value)
End If
End While
Return New SqlString(sb.ToString())
End Function
' returns the matching string
' matchIndex is the zero-based index of the results. 0 for the 1st match, 1, for 2nd match, etc
<SqlFunction()> _
Public Shared Function RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
Dim regex As New Regex(pattern.Value, Options)
Dim results As Match = regex.Match(New String(input.Value))
Dim resultStr As String = ""
Dim index As Integer = 0
While results.Success
If index = matchIndex Then
resultStr = results.Value.ToString()
End If
results = results.NextMatch()
index += 1
End While
Return New SqlString(resultStr)
End Function
End Class
我安装了这个CLR如下:
EXEC sp_configure
'clr enabled' ,
'1'
GO
RECONFIGURE
USE [db_Utility]
GO
CREATE ASSEMBLY SQL_CLR_RegExp FROM 'D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\SQL_CLR_RegExp.dll' WITH
PERMISSION_SET = SAFE
GO
-- =============================================
-- Returns 1 or 0 if input matches pattern
-- VB function: RegexMatch(ByVal input As SqlChars, ByVal pattern As SqlString) As SqlBoolean
-- =============================================
CREATE FUNCTION [dbo].[RegexMatch]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX)
)
RETURNS [bit]
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexMatch]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexReplace(ByVal expression As SqlString, ByVal pattern As SqlString, ByVal replace As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexReplace]
(
@expression [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@replace [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexReplace]
GO
-- =============================================
-- Returns a comma separated string of found objects
-- VB function: RegexSelectAll(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchDelimiter As SqlString) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectAll]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchDelimiter [nvarchar](MAX)
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectAll]
GO
-- =============================================
-- Returns finding matchIndex of a zero based index
-- RegexSelectOne(ByVal input As SqlChars, ByVal pattern As SqlString, ByVal matchIndex As SqlInt32) As SqlString
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOne]
(
@input [nvarchar](MAX) ,
@pattern [nvarchar](MAX) ,
@matchIndex [int]
)
RETURNS [nvarchar](MAX)
WITH EXECUTE AS CALLER
AS EXTERNAL NAME
[SQL_CLR_RegExp].[SQL_CLR_RegExp.UserDefinedFunctions].[RegexSelectOne]
GO
然后我写了下面的包装功能部件,简化使用:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================
-- Author: <Jordon Pilling>
-- Create date: <30/01/2013>
-- Description: <Calls RegexSelectOne with start and end text and cleans the result>
-- =============================================
CREATE FUNCTION [dbo].[RegexSelectOneWithScrub]
(
@Haystack VARCHAR(MAX),
@StartNeedle VARCHAR(MAX),
@EndNeedle VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @ReturnStr VARCHAR(MAX)
--#### Extract text from HayStack using Start and End Needles
SET @ReturnStr = dbo.RegexSelectOne(@Haystack, REPLACE(@StartNeedle, ' ','\s') + '((.|\n)+?)' + REPLACE(@EndNeedle, ' ','\s'), 0)
--#### Remove the Needles
SET @ReturnStr = REPLACE(@ReturnStr, @StartNeedle, '')
SET @ReturnStr = REPLACE(@ReturnStr, @EndNeedle, '')
--#### Trim White Space
SET @ReturnStr = LTRIM(RTRIM(@ReturnStr))
--#### Trim Line Breaks and Carriage Returns
SET @ReturnStr = dbo.SuperTrim(@ReturnStr)
RETURN @ReturnStr
END
GO
这允许的用法如下:
DECLARE @Subject VARCHAR(250) = 'HelpDesk Call Reference F0012345, Call Update, 40111'
DECLARE @Ref VARCHAR(250) = NULL
IF dbo.RegexMatch(@Subject, '^HelpDesk\sCall\sReference\sF[0-9]{7},\s(Call\sResolved|Call\sUpdate|New\scall\slogged),(|\s+)([0-9]+|unknown)$') = 1
SET @Ref = ISNULL(dbo.RegexSelectOneWithScrub(@Subject, 'HelpDesk Call Reference', ','), 'Invalid (#1)')
ELSE
SET @Ref = 'Invalid (#2)'
SELECT @Ref
这是远远快于使用多次搜索,并用大量文字与differeent开始和结束短语等打交道时更强大
Answer 3:
这个例子是Oracle查询。 使用所有的功能都ANSI SQL标准,该标准将在任何SQL工作。 这个例子削减字符串的唯一REF部分。 您只需重复所有步骤类型,OUTREF,等...这个例子假设你的裁判总是包含0零,而且将永远“”裁判后,可以用空格或其他字符替换。 该NVL()CNA使用:INSTR(STR,NVL( ' ''')...)。 我觉得这个方法是比较通用的硬编码,然后将值SUBSTR ...:
SELECT str, SUBSTR(str, ref_start_pos, ref_end_pos) final_ref
FROM
(
SELECT str, ref_start_pos, INSTR(str, ',', ref_start_pos)-ref_start_pos AS ref_end_pos
FROM
(
SELECT str, INSTR(str, '0')-1 AS ref_start_pos
FROM
(
SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS str
FROM dual
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112'
FROM dual
)
)
)
/
SQL>
STR | FINAL_REF
------------------------------------------------------------------------
HelpDesk Call Reference F0012345, Call Update, 40111 | F0012345
HelpDesk Call Reference F0012346, Call Resolved, 40112 | F0012346
SQL Server版本(由OP加):
SELECT [str] ,
SUBSTRING([str], ref_start_pos, ref_end_pos) AS final_ref
FROM ( SELECT [str] ,
ref_start_pos ,
CHARINDEX(',', [str], ref_start_pos) - ref_start_pos AS ref_end_pos
FROM ( SELECT [str] ,
CHARINDEX('Reference', [str]) + 10 AS ref_start_pos
FROM ( SELECT 'HelpDesk Call Reference F0012345, Call Update, 40111' AS [str]
UNION ALL
SELECT 'HelpDesk Call Reference F0012346, Call Resolved, 40112' AS [str]
) AS T1
) AS T2
) AS T3
文章来源: Best way to extract segments / values from VARCHAR field in SET based SQL