I'm currently working on a problem where certain characters need to be cleaned from strings that exist in a table. Normally I'd do a simple UPDATE with a replace, but in this case there are 32 different characters that need to be removed.
I've done some looking around and I can't find any great solutions for quickly cleaning strings that already exist in a table.
Things I've looked into:
Doing a series of nested replaces
This solution is do-able, but for 32 different replaces it would require either some ugly code, or hacky dynamic sql to build a huge series of replaces.
PATINDEX and while loops
As seen in this answer it is possible to mimic a kind of regex replace, but I'm working with a lot of data so I'm hesitant to even trust the improved solution to run in a reasonable amount of time when the data volume is large.
Recursive CTEs
I tried a CTE approuch to this problem, but it didn't run terribly fast once the number of rows got large.
For reference:
CREATE TABLE #BadChar(
id int IDENTITY(1,1),
badString nvarchar(10),
replaceString nvarchar(10)
);
INSERT INTO #BadChar(badString, replaceString) SELECT 'A', '^';
INSERT INTO #BadChar(badString, replaceString) SELECT 'B', '}';
INSERT INTO #BadChar(badString, replaceString) SELECT 's', '5';
INSERT INTO #BadChar(badString, replaceString) SELECT '-', ' ';
CREATE TABLE #CleanMe(
clean_id int IDENTITY(1,1),
DirtyString nvarchar(20)
);
DECLARE @i int;
SET @i = 0;
WHILE @i < 100000 BEGIN
INSERT INTO #CleanMe(DirtyString) SELECT 'AAAAA';
INSERT INTO #CleanMe(DirtyString) SELECT 'BBBBB';
INSERT INTO #CleanMe(DirtyString) SELECT 'AB-String-BA';
SET @i = @i + 1
END;
WITH FixedString (Step, String, cid) AS (
SELECT 1 AS Step, REPLACE(DirtyString, badString, replaceString), clean_id
FROM #BadChar, #CleanMe
WHERE id = 1
UNION ALL
SELECT Step + 1, REPLACE(String, badString, replaceString), cid
FROM FixedString AS T1
JOIN #BadChar AS T2 ON T1.step + 1 = T2.id
Join #CleanMe AS T3 on T1.cid = t3.clean_id
)
SELECT String FROM FixedString WHERE step = (SELECT MAX(STEP) FROM FixedString);
DROP TABLE #BadChar;
DROP TABLE #CleanMe;
Use a CLR
It seems like this is a common solution many people use, but the environment I'm in doesn't make this a very easy one to embark on.
Are there any other ways to go about this I've over looked? Or any improvements upon the methods I've already looked into for this?