Replace multiple words in a string from a list of

2019-02-25 15:15发布

问题:

i have a list of words:

string[] BAD_WORDS = { "xxx", "o2o" } // My list is actually a lot bigger about 100 words

and i have some text (usually short , max 250 words), which i need to REMOVE all the BAD_WORDS in it.

i have tried this:

    foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

but, if the text starts or ends with a bad word, it will not be removed. i did it with the spaces, so it will not match partial words for example "oxxx" should not be removed, since it is not an exact match to the BAD WORDS.

anyone can give me advise on this?

回答1:

string cleaned = Regex.Replace(input, "\\b" + string.Join("\\b|\\b",BAD_WORDS) + "\\b", "")


回答2:

This is a great task for Linq, and also the Split method. Try this:

return string.Join(" ",
                   input.Split(' ').Select(w => BAD_WORDS.Contains(w) ? "" : w));


回答3:

You could use StartWith and EndsWith methods like:

while (input.Contains(w) || input.StartsWith(w) || input.EndsWith(w) || input.IndexOf(w) > 0)
{
   input = input.Replace(w, " ");
}

Hope this will fix your problem.



回答4:

Put the fake space's before and after the string varaible input. That way it will detect the first and last words.

input = " " + input + " ";

 foreach (var word in BAD_WORDS)
    {
        string w = string.Format(" {0} ", word);
        if (input.Contains(w))
        {
            while (input.Contains(w))
            {
                input = input.Replace(w, " ");
            }
        }
    }

Then trim the string:

input = input.Trim();


回答5:

You can store words from text to one list. Then just check all words if they are in bad list, something like this :

List<string> myWords = input.Split(' ').ToList();
List<string> badWords = GetBadWords();

myWords.RemoveAll(word => badWords.Contains(word));
string Result = string.Join(" ", myWords);


回答6:

Just wanted to point out that you shoulde have done with just whiole inside your for like so:

   foreach (var word in BAD_WORDS)
{
    while (input.Contains(String.Format(" {0} ", word);))
    {
        input = input.Replace(w, " ");
    }
}

No need for that if and 'w' variable, in any case i wouldehave used the answer above me that Antonio Bakula, first think that came to mind was this.



回答7:

According to the following post the fastest way is to use Regex and MatchEvaluator : Replacing multiple characters in a string, the fastest way?

        Regex reg = new Regex(@"(o2o|xxx)");
        MatchEvaluator eval = match =>
        {
            switch (match.Value)
            {
                case "o2o": return " ";
                case "xxx": return " ";
                default: throw new Exception("Unexpected match!");
            }
        };
        input = reg.Replace(input, eval);