Regular expression is being too greedy

2020-05-05 00:50发布

I am trying to write a regular expression but it’s being too greedy. The input string could be in either of the following formats:

STUFF_12_1234 or STUFF_1234

What I want to do is to create a regular expression to grab the characters after the last _. So in the above examples that would be the numbers “1234”. The number of characters after this last _ varies and they could be a combination of letters and numbers. I have tried the following expression:

_(.*?)\Z

This works for “STUFF_1234” by returning “1234” but when I use it against “STUFF_12_1234” it returns “12_1234”

Anyone advise on how the expression should be changed to fix this?

标签: regex c#-4.0
5条回答
家丑人穷心不美
2楼-- · 2020-05-05 01:17

Two options.

  1. With regex:

    _([^_]*?)
    

    You just need not to select the _ character or

    _(\d*?)
    

    If you know the characters are numeric (\d)

  2. With substring:

    yourString.Substring(yourString.LastIndexOf('_')+1)
    
查看更多
Emotional °昔
3楼-- · 2020-05-05 01:18

There are at least 3 ways to grab the text appearing after the last underscore _:

  • Keep the current regex, but specify RightToLeft RegexOptions. Since the regex is searched from right to left, the lazy quantifier will match as few character as possible up to just right after the last _ in the string.

  • Modify the regex to disallow underscore _ in the text you want to match:

    _([^_]*)\Z
    
  • Split the input string by _ and pick the last item. For this String.Split is sufficient, no need for Regex.Split.

查看更多
一夜七次
4楼-- · 2020-05-05 01:19

Use the regexp

_([^_]*)\Z

Regular expressions search left-to-right, so greediness controls whether they stop early or late. But it won't change the location of the left end of the match.

查看更多
趁早两清
5楼-- · 2020-05-05 01:28

Try this:

String s_YourString="STUFF_12_34";
String s_OP = Regex.Match(s_YourString, "_[^_.]+$").Value.Trim('_');//Output:34            
s_YourString="STUFF_1234";
s_OP = Regex.Match(s_YourString, "_[^_.]+$").Value.Trim('_');   //Output:1234
查看更多
时光不老,我们不散
6楼-- · 2020-05-05 01:34

Exclude the _ from the list of valid chars:

_([^_]*)\Z
查看更多
登录 后发表回答