I wanted to capture a stream of digits which are not followed by certain digits. For example
input = abcdef lookbehind 123456..... asjdnasdh lookbehind 789432
I want to capture 789432 and not 123 using negative lookahead only.
I tried (?<=lookbehind )([\d])+(?!456)
but it captures 123456
and 789432
.
Using (?<=lookbehind )([\d])+?(?!456)
captures only 1
and 7
.
Grouping is not an option for me as my use case doesn't allow me to do it.
Is there any way I can capture 789432
and not 123
using pure regex?
An explanation for the answer is appreciated.
You may use a possessive quantifier with a negative lookbehind
See this regex demo.
A synonymous pattern with an atomic group:
Details
(?<=lookbehind )
- a positive lookbehind that matches a location in string that is immediately preceded withlookbehind
\d++
- 1+ digits matched possessively, allowing no backtracking into the pattern (the engine cannot retry matching from any digit matched with\d++
)(?<!456)
- a negative lookbehind check that fails the match if the last 3 digits matched with\d++
are456
.The negative lookbehind
(?<!...)
makes sure that a certain pattern does not match immediately to the left of the current location. A negative lookahead(?!...)
fails the match if its pattern matches immediately to the right of the current location. "Fail" here means that the regex engine abandons the current way of matching a string, and if there are quantified patterns before the lookbehind/lookahead the engine might backtrack into those patterns to try and match a string differently. Note that here, a possessive quantifier makes it impossible for the engine to perform the lookbehind check for456
multiple times, it is only executed once all the digits are grabbed with\d++
.You
(?<=lookbehind )([\d])+(?!456)
regex matches123456
because the\d+
matches these digits in a greedy way (all at once) and(?!456)
checks for456
after them, and since there are no456
there, the match is returned. The(?<=lookbehind )([\d])+?(?!456)
matches only one digit because\d+?
matches in a lazy way, 1 digit is matched and then the loolahead check is performed. Since there is no456
after1
,1
is returned.It does not allow a regex engine to retry matching a string differently if there are quantified patterns before. So,
(?<=lookbehind )\d+(?<!456)
matches12345
in123456
as there is no456
before6
.You may use a negative lookbehind as well:
RegEx Demo
RegEx Details:
(?<=lookbehind )
: Positive lookbehind to assert that we have"lookbehind "
before current position\d+\b
: Match 1+ digits followed by word boundary(?<!456)
: Negative lookbehind to assert that we don't have456
before current positionAlternative solution using a negative lookahead:
RegEx Demo 2
We need
\d*
in lookahead expression(?!\d*456)
so that we can skip456
after matching 0 or more digits from current position.