Match only the nth occurrence using a regular expr

2020-04-18 08:23发布

I have a string with 3 dates in it like this:

XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx

I want to select the 2nd date in the string, the 20180208 one.

Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.

Thanks for any help.

标签: c# regex
4条回答
放荡不羁爱自由
2楼-- · 2020-04-18 09:06

You can use System.Text.RegularExpressions.Regex

See the following example

Regex regex = new Regex(@"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
    Console.WriteLine(groups[1].Value);
}
查看更多
啃猪蹄的小仙女
3楼-- · 2020-04-18 09:14

You could use the regular expression

^(?:.*?\d{8}_){1}.*?(\d{8})

to save the 2nd date to capture group 1.

Demo

Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use

^(?:.*?\d{8}_){0}.*?(\d{8})

Demo

The C#'s regex engine performs the following operations.

^        # match the beginning of a line
(?:      # begin a non-capture group
  .*?    # match 0+ chars lazily
  \d{8}  # match 8 digits
  _      # match '_'
)        # end non-capture group
{n}      # execute non-capture group n (n >= 0) times
.*?      # match 0+ chars lazily     
(\d{8})  # match 8 digits in capture group 1

The important thing to note is that .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string

_1234abcd_efghi_123456789_12345678_ABC

capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

查看更多
我欲成王,谁敢阻挡
4楼-- · 2020-04-18 09:15

You could use

^(?:[^_]+_){2}(\d+)

And take the first group, see a demo on regex101.com.


Broken down, this says

^              # start of the string
(?:[^_]+_){2}  # not _ + _, twice
(\d+)          # capture digits

C# demo:

var pattern = @"^(?:[^_]+_){2}(\d+)"; 
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208
查看更多
够拽才男人
5楼-- · 2020-04-18 09:24

Try this one

MatchCollection matches = Regex.Matches(sInputLine, @"\d{8}");

string sSecond = matches[1].ToString();

查看更多
登录 后发表回答