I have a .net regex which I am testing using Windows Powershell. The output is as follows:
> [System.Text.RegularExpressions.Regex]::Match("aaa aaa bbb", "aaa.*?bbb")
Groups : {aaa aaa bbb}
Success : True
Captures : {aaa aaa bbb}
Index : 0
Length : 11
Value : aaa aaa bbb
My expectation was that using the ?
quantifier would cause the match to be aaa bbb
, as the second group of a's is sufficient to satisfy the expression. Is my understanding of non-greedy quantifiers flawed, or am I testing incorrectly?
Note: this is plainly not the same problem as Regular Expression nongreedy is greedy
This is a common misunderstanding. Lazy quantifiers do not guarantee the shortest possible match. They only make sure that the current quantifier, from the current position, does not match more characters than needed for an overall match.
If you truly want to ensure the shortest possible match, you need to make that explicit. In this case, this means that instead of
.*?
, you want a subregex that matches anything that is neitheraaa
norbbb
. The resulting regex will therefore beWell it's really simple, we have the following string
Let's see we have this regex
aaa.*?bbb
. The regex engine will start withaaa
The regex engine has now
.*?bbb
. It will proceed with thespace
but we still have some characters until
bbb
? So the regex engine will continue it's way and match the second set of aFinally the regex engine will match
bbb
:So let's see, if we only want to match the second
aaa
we could use the following regex:(?<!^)aaa.*?bbb
, this means to matchaaa
that is not at the beginning of the sentence.We may also use
aaa(?= bbb).*?bbb
, this means to matchaaa
that is followed byspace bbb
.See it working 1 - 2.
Just came to my senses, but why don't you directly use
aaa bbb
?This is not a greedy/lazy problem. The problem comes to the fact that your string is analysed from left to right. When the first
aaa
is matched, the regex engine add characters one by one to have the complete pattern.Note that with a greedy behaviour, in your example, you obtain the same result: the first
aaa
is matched, the regex engine take all the last characters and backtrack character by character until having the complete match.Compare the result for the string
aaa aaa bbb bbb
:The regex engine finds first occurrence of
aaa
and then skips all characters (.*?
) until first occurrence ofbbb
, but for the greedy operator (.*
) it will go on to find a larger result and therefore match the last occurrence ofbbb
.