SQL Server 2008 R2 - Scalar UDF results in infinit

2019-08-03 08:17发布

问题:

The following code is resulting in an infinite loop or really really slow execution:

CREATE FUNCTION [dbo].[CleanUriPart] 
(
    -- Add the parameters for the function here
    @DirtyUriPart nvarchar(200)
)

RETURNS nvarchar(200)
AS
BEGIN;
    -- Declare the return variable here
    DECLARE @Result nvarchar(200);

DECLARE @i int;

SET @i = 1;

WHILE 1 = 1
BEGIN;
    SET @i = PATINDEX('%[^a-zA-Z0-9.~_-]%', @DirtyUriPart COLLATE Latin1_General_BIN);
    IF @i > 0
        SET @DirtyUriPart = STUFF(@DirtyUriPart, @i, 1, '-');
    ELSE
        BREAK;
END;

-- Add the T-SQL statements to compute the return value here
SELECT @Result = @DirtyUriPart;

-- Return the result of the function
RETURN @Result;

END;

The input/output should be as follows:

  • 'abcdef' -> 'abcdef' works ok
  • 'abc-def' -> 'abc-def' results in infinite loop
  • 'abc*def' -> 'abc-def' results in infinite loop
  • etc.

Please help!

回答1:

SELECT PATINDEX('%[^a-]%', N'aaa-def' COLLATE Latin1_General_BIN),
       PATINDEX('%[^-a]%', N'aaa-def' COLLATE Latin1_General_BIN), 
       PATINDEX('%[^a-]%', 'aaa-def' COLLATE Latin1_General_BIN),
       PATINDEX('%[^-a]%', 'aaa-def' COLLATE Latin1_General_BIN)

Returns

----------- ----------- ----------- -----------
1           5           5           5

So it seems that for varchar datatypes a trailing - is treated as being part of a set whereas for nvarchar it is ignored (treated as a malformed range as a is ignored too?)

The BOL entry for LIKE doesn't explicitly talk about how to use - within [] to get it to be treated as part of a set but does have the example

LIKE '[-acdf]'

to match -, a, c, d, or f so I assume that it needs to be the first item in a set (i.e. that [^a-zA-Z0-9.~_-] needs to be altered to [^-a-zA-Z0-9.~_]). That also matches the result of my testing above.



回答2:

Any chance @DirtyUriPart can evaluate to NULL? ON the PATINDEX function, if either pattern or expression is NULL, PATINDEX returns NULL and a NULL in this case will cause a infinite loop



回答3:

It looks like you could fix the problem by casting @DirtyUriPart as VARCHAR(200) in PATINDEX, which will cause the dash to be recognized along with the other characters in the class:

DECLARE @DirtyUriPart nvarchar(200)='abc-def';

-- Returns 0
SELECT PATINDEX('%[^a-zA-Z0-9.~_-]%', CAST(@DirtyUriPart AS VARCHAR(200)) COLLATE Latin1_General_BIN);

-- Returns 4
SELECT PATINDEX('%[^a-zA-Z0-9.~_-]%', @DirtyUriPart COLLATE Latin1_General_BIN);