I try to understand the non-greedy regex in python, but I don't understand why the following examples have this results:
print(re.search('a??b','aaab').group())
ab
print(re.search('a*?b','aaab').group())
aaab
I thought it would be 'b' for the first and 'ab' for the second. Can anyone explain that?
Its because of that
??
is lazy while?
is greedy.and a lazy quantifier will match zero or one (its left token), zero if that still allows the overall pattern to match.for example all the following will returns an empty string :And the regex
a??b
will matchab
orb
:And if it doesn't allows the overall pattern to match and there was not any
b
it will return None :And about the second part you have a none-greedy regex and the result is very obvious.It will match any number of
a
and thenb
:Explanation for the Pattern -
/a??b/
a??
matches the charactera
literally (case sensitive), Then the quantifier??
means Between zero and one time, as few times as possible, expanding as needed [lazy], then characterb
should match, literally (case sensitive)So It will match last
'ab'
characters in the given string'aaab'
And For Pattern -
/a*?b/
a*?
matches the character'a'
literally (case sensitive) Here the Quantifier*?
means between zero and unlimited times, as few times as possible, expanding as needed [lazy] then characterb
should match, literally (case sensitive).So It will match
'aaab'
as a whole in'aaab'
This happens because the matches you are asking match afterwards. If you try to follow how the matching for
a??b
happens from left to right you'll see something like this:a
plusb
vsaaab
: no match (b != a
)a
plusb
vsaaab
: no match (ab != aa
)a
plusb
vsaab
: no match (b != a
) (match position moved to the right by one)a
plusb
vsaab
: no match (ab != aa
)a
plusb
vsab
: no match (b != a
) (match position moved to the right by one)a
plusb
vsab
: match (ab == ab
)Similarly for
*?
.The fact is that the
search
function returns the leftmost match. Using??
and*?
changes only the behaviour to prefer the shortest leftmost match but it will not return a shorter match that starts at the right of an already found match.Also note that the
re
module doesn't return overlapping matches, so even usingfindall
orfinditer
you will not be able to find the two matches you are looking for.