Regex replace everything except a particular patte

2019-02-25 15:48发布

问题:

I'm looking to extract:

50%

From a string that will have more or less this format:

The 50% is in here somewhere.

I'd also like to extract:

50%50%25%

From a string like this:

50% of 50% is 25%

Regex.Match() seems the obvious contender. However, this involves checking if any matches were found (e.g. match.Success), extracting the results from a particular index in the array, and/or the risk of addressing an out-of-bounds index.

Regex replace is generally simpler to apply. A single line does the job, including returning the resulting string. This is true for so many languages.

result = Regex.Replace(input, stuffWeDontLike, "")

Basically, I am looking for a regex filter - instead of entering the pattern to replace, I want to enter the pattern to retrieve.

percentages = Regex.Filter("50% of 50% is 25%", "[0-9]+\%")

Could we form a regex and invert the result, as if it were a selection? That would allow the use of regex replace. However, I could not find a way to easily invert a regex.

How can we achieve the desired result (or similar; a join or so seems acceptable) with very short and simple syntax, similar to regex replace?

回答1:

You can use Regex.Matches and concatenate each matches result. Just pick one you like the most.

//Sadly, we can't extend the Regex class
public class RegExp
{
    //usage : RegExp.Filter("50% of 50% is 25%", @"[0-9]+\%")
    public static string Filter(string input, string pattern)
    {
        return Regex.Matches(input, pattern).Cast<Match>()
            .Aggregate(string.Empty, (a,m) => a += m.Value);
    }
}

public static class StringExtension
{
    //usage : "50% of 50% is 25%".Filter(@"[0-9]+\%")
    public static string Filter(this string input, string pattern)
    {
        return Regex.Matches(input, pattern).Cast<Match>()
            .Aggregate(string.Empty, (a,m) => a += m.Value);
    }
}


回答2:

I do not understand your reasoning why you want to use replace. Why go that way in the first place? There are methods in the Regex class that allow you to precisely get all the desired matches. Your roundabout way at getting to your solution I find is pointless.

Just use Matches() to collect the matches. You could then join them into the string that you wanted.

var str = "50% of 50% is 25%";
var re = new Regex(@"\d+%");
var ms = re.Matches(str);
var values = ms.Cast<Match>().Select(m => m.Value);
var joined = String.Join("", values); // "50%50%25%"


回答3:

One solution is to use regex replace as follows:

Regex.Replace("50% of 50% is 25%", "(\d+\%)|(?:.+?)", "$1");

Output:

50%50%25%

As a general approach:

Regex.Replace(input, (pattern)|(?:.+?), "$1");

This finds anything that matches either of the following:

  • The pattern. Captured as $1. This is what we want to keep.
  • Any character, any number of times, but non-greedy. This finds anything that is not captured by the first group. ?: because we don't need to capture this group.

As MSDN states: "$1 replaces the entire match with the first captured subexpression." (That is, all matches for that substring, concatenated.)

Effectively, this is the described regex filter.



标签: c# regex replace