Search and Replace a string t-SQL

2019-07-19 13:49发布

问题:

Everyone I am trying to write a query to replace all occurrences of a string at the end. I have some noise words(104 to be exact) that need to be removed from the string if they appear at the end.

Two noise words for example are --Company, LLC

Here are some examples and expected output:

American Company, LLC --Expected output --American (both noise words should be removed)
American LLC,LLC --Expected output -- American
American Company American Company-- American Company American (one noise word occurs in between other words, so it should not be removed)

currently I have this query:

DECLARE @NEWSTRING VARCHAR(max) 
DECLARE @NEWSTRINGlength nvarchar(max)

SET @NEWSTRING = 'American Company American Company Company, LLC  LLC' ; 

SET @NEWSTRINGlength = len(@newstring)
SELECT @NEWSTRINGlength

CREATE TABLE #item (item Nvarchar(250) null)

INSERT INTO #item

SELECT 'Company' as item
UNION ALL 
SELECT 'LLC' as item

DECLARE @unwantedCharecters  VARCHAR(50) = '%[~,@,#,$,%,&,*,(,),.,!, ]%'

WHILE PATINDEX( @unwantedCharecters, @NEWSTRING ) > 0
SELECT @NEWSTRING = ltrim(rtrim(Replace(REPLACE( @NEWSTRING, SUBSTRING( @NEWSTRING, PATINDEX( @unwantedCharecters, @NEWSTRING ), 1 ),''),'-',' ')))

SELECT @NEWSTRING = substring(rtrim(@NEWSTRING), 1, len(@newstring) - len(ITEM)) FROM #item WHERE  rtrim(@NEWSTRING) LIKE '%' + ITEM

Each occurrence of the noise word should be removed, unless they appear in between other words.

回答1:

This will do the trick:

WITH 
DirtyValues AS(
    SELECT * FROM (VALUES
          (1, 'American Company, LLC') --Expected output --American (both noise words should be removed)
        , (2, 'American LLC,LLC') --Expected output -- American
        , (3, 'American Company American Company')-- American Company American (one noise word occurs in between other words, so it should not be removed)
    ) AS T(ID, Dirty)
),
NoisyWords AS(
    SELECT * FROM (VALUES
          (' ') -- Just apend the chars to be filtered to your noise word list
        , (',')
        , ('LLC')
        , ('Company')
    ) AS T(Noisy)
),
DoSomeMagic AS(
    SELECT ID
         , Result = REVERSE(Dirty)
    FROM DirtyValues 
    UNION ALL 
    SELECT ID
         , Result = SUBSTRING(Result, DATALENGTH(Noisy)+1, DATALENGTH(Result))
    FROM DoSomeMagic
        CROSS APPLY(
            SELECT 
                  Noisy = REVERSE(Noisy)
            FROM NoisyWords
        ) AS T
    WHERE PATINDEX('%' + Noisy + '%', Result) = 1
),
PickBestResult AS(
    SELECT DoSomeMagic.ID
         , [clean as a whistle] = REVERSE(DoSomeMagic.Result)
         , [Rank]               = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATALENGTH(Result) ASC)
    FROM DoSomeMagic
)
SELECT *
FROM PickBestResult
WHERE [Rank] = 1

What it does:

  • First 2 CTE's are your datasets, you'll of course want to change them for your own tables.
  • DoSomeMagic is recursive CTE, first reversing the string to be able to seek from the end and then cross applies all the noise words and checks that the now start of the string starts with the reverse noise word. If so, remove it and keep on going until no noise words are found at the start.
  • PickBestResult will then [Rank] each row and the own with shortest result will get Rank 1.