The following code is resulting in an infinite loop or really really slow execution:
CREATE FUNCTION [dbo].[CleanUriPart]
(
-- Add the parameters for the function here
@DirtyUriPart nvarchar(200)
)
RETURNS nvarchar(200)
AS
BEGIN;
-- Declare the return variable here
DECLARE @Result nvarchar(200);
DECLARE @i int;
SET @i = 1;
WHILE 1 = 1
BEGIN;
SET @i = PATINDEX('%[^a-zA-Z0-9.~_-]%', @DirtyUriPart COLLATE Latin1_General_BIN);
IF @i > 0
SET @DirtyUriPart = STUFF(@DirtyUriPart, @i, 1, '-');
ELSE
BREAK;
END;
-- Add the T-SQL statements to compute the return value here
SELECT @Result = @DirtyUriPart;
-- Return the result of the function
RETURN @Result;
END;
The input/output should be as follows:
- 'abcdef' -> 'abcdef' works ok
- 'abc-def' -> 'abc-def' results in infinite loop
- 'abc*def' -> 'abc-def' results in infinite loop
- etc.
Please help!
SELECT PATINDEX('%[^a-]%', N'aaa-def' COLLATE Latin1_General_BIN),
PATINDEX('%[^-a]%', N'aaa-def' COLLATE Latin1_General_BIN),
PATINDEX('%[^a-]%', 'aaa-def' COLLATE Latin1_General_BIN),
PATINDEX('%[^-a]%', 'aaa-def' COLLATE Latin1_General_BIN)
Returns
----------- ----------- ----------- -----------
1 5 5 5
So it seems that for varchar
datatypes a trailing -
is treated as being part of a set whereas for nvarchar
it is ignored (treated as a malformed range as a
is ignored too?)
The BOL entry for LIKE doesn't explicitly talk about how to use -
within []
to get it to be treated as part of a set but does have the example
LIKE '[-acdf]'
to match -, a, c, d, or f
so I assume that it needs to be the first item in a set (i.e. that [^a-zA-Z0-9.~_-]
needs to be altered to [^-a-zA-Z0-9.~_]
). That also matches the result of my testing above.
Any chance @DirtyUriPart can evaluate to NULL? ON the PATINDEX function, if either pattern or expression is NULL, PATINDEX returns NULL and a NULL in this case will cause a infinite loop
It looks like you could fix the problem by casting @DirtyUriPart
as VARCHAR(200)
in PATINDEX
, which will cause the dash to be recognized along with the other characters in the class:
DECLARE @DirtyUriPart nvarchar(200)='abc-def';
-- Returns 0
SELECT PATINDEX('%[^a-zA-Z0-9.~_-]%', CAST(@DirtyUriPart AS VARCHAR(200)) COLLATE Latin1_General_BIN);
-- Returns 4
SELECT PATINDEX('%[^a-zA-Z0-9.~_-]%', @DirtyUriPart COLLATE Latin1_General_BIN);