Everyone I am trying to write a query to replace all occurrences of a string at the end.
I have some noise words(104 to be exact) that need to be removed from the string if they appear at the end.
Two noise words for example are --Company, LLC
Here are some examples and expected output:
American Company, LLC --Expected output --American (both noise words should be removed)
American LLC,LLC --Expected output -- American
American Company American Company-- American Company American (one noise word occurs in between other words, so it should not be removed)
currently I have this query:
DECLARE @NEWSTRING VARCHAR(max)
DECLARE @NEWSTRINGlength nvarchar(max)
SET @NEWSTRING = 'American Company American Company Company, LLC LLC' ;
SET @NEWSTRINGlength = len(@newstring)
SELECT @NEWSTRINGlength
CREATE TABLE #item (item Nvarchar(250) null)
INSERT INTO #item
SELECT 'Company' as item
UNION ALL
SELECT 'LLC' as item
DECLARE @unwantedCharecters VARCHAR(50) = '%[~,@,#,$,%,&,*,(,),.,!, ]%'
WHILE PATINDEX( @unwantedCharecters, @NEWSTRING ) > 0
SELECT @NEWSTRING = ltrim(rtrim(Replace(REPLACE( @NEWSTRING, SUBSTRING( @NEWSTRING, PATINDEX( @unwantedCharecters, @NEWSTRING ), 1 ),''),'-',' ')))
SELECT @NEWSTRING = substring(rtrim(@NEWSTRING), 1, len(@newstring) - len(ITEM)) FROM #item WHERE rtrim(@NEWSTRING) LIKE '%' + ITEM
Each occurrence of the noise word should be removed, unless they appear in between other words.
This will do the trick:
WITH
DirtyValues AS(
SELECT * FROM (VALUES
(1, 'American Company, LLC') --Expected output --American (both noise words should be removed)
, (2, 'American LLC,LLC') --Expected output -- American
, (3, 'American Company American Company')-- American Company American (one noise word occurs in between other words, so it should not be removed)
) AS T(ID, Dirty)
),
NoisyWords AS(
SELECT * FROM (VALUES
(' ') -- Just apend the chars to be filtered to your noise word list
, (',')
, ('LLC')
, ('Company')
) AS T(Noisy)
),
DoSomeMagic AS(
SELECT ID
, Result = REVERSE(Dirty)
FROM DirtyValues
UNION ALL
SELECT ID
, Result = SUBSTRING(Result, DATALENGTH(Noisy)+1, DATALENGTH(Result))
FROM DoSomeMagic
CROSS APPLY(
SELECT
Noisy = REVERSE(Noisy)
FROM NoisyWords
) AS T
WHERE PATINDEX('%' + Noisy + '%', Result) = 1
),
PickBestResult AS(
SELECT DoSomeMagic.ID
, [clean as a whistle] = REVERSE(DoSomeMagic.Result)
, [Rank] = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATALENGTH(Result) ASC)
FROM DoSomeMagic
)
SELECT *
FROM PickBestResult
WHERE [Rank] = 1
What it does:
- First 2 CTE's are your datasets, you'll of course want to change them for your own tables.
- DoSomeMagic is recursive CTE, first reversing the string to be able to seek from the end and then cross applies all the noise words and checks that the now start of the string starts with the reverse noise word. If so, remove it and keep on going until no noise words are found at the start.
- PickBestResult will then [Rank] each row and the own with shortest result will get Rank 1.