Regex lookahead only removing the last character

2020-04-17 06:07发布

Im creating a regex that searches for a text, but only if there isnt a dash after the match. Im using lookahead for this:

  • Regex: Text[\s\.][0-9]*(?!-)

Expected result Result --------------- ------- Text 11 Text 11 Text 11 Text 52- <No Match> Text 5

Test case: https://regex101.com/r/doklxc/1/

The lookahead only seems to be matching with the previous character, which leaves me with Text 5, while I need it to not return a match at all.

Im checking the https://www.regular-expressions.info/ guides and tried using groups, but I cant wrap my head around this one.

How can I make it so the lookbehind function affects the entire preceding match?

Im using the default .Net Text.RegularExpressions library.

1条回答
乱世女痞
2楼-- · 2020-04-17 06:25

The [0-9]* backtracks and lets the regex engine find a match even if there is a -.

There are two ways: either use atomic groups or check for a digit in the lookahead:

Text[\s.][0-9]*(?![-\d])

Or

Text(?>[\s.][0-9]*)(?!-)

See the regex demo #1 and the regex demo #2.

Details

  • Text[\s.][0-9]*(?![-\d]) matches Text, then a dot or a whitespace, then 0 or more digits, and then it checks of there is a - or digit immediately to the right, and if there is, it fails the match. Even when trying to backtrack and match fewer digits than it grabbed before, the \d in the lookahead will fail those attempts
  • Text(?>[\s.][0-9]*)(?!-) matches Text, then an atomic group starts where backtracking won't be let in after the group patterns find their matching text. (?!-) only checks for a - after the [0-9]* pattern tries to grab any digits.
查看更多
登录 后发表回答