At the end of the page there is at attempted explanation of how do greedy, reluctant and possessive quantifiers work: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html
However I tried myself an example and I don't seem to understand it fully.
I will paste my results directly:
Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.
Enter your regex: (.*)+foo
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.
Why does the first reg.exp. find no match and the second does?
What is the exact difference between those 2 reg.exp.?
The +
after another quantifier means "don't allow the regex engine to backtrack into whatever the previous token has matched". (See a tutorial on possessive quantifiers here).
So when you apply .*foo
to "xfooxxxxxxfoo"
, the .*
first matches the entire string. Then, since foo
can't be matched, the regex engine backtracks until that's possible, achieving a match when .*
has matched "xfooxxxxxx"
and foo
has matched "foo"
.
Now the additional +
prevents that backtracking from happening, so the match fails.
When you write (.*)+foo
. the +
takes on an entirely different meaning; now it means "one or more of the preceding token". You've created nested quantifiers, which is not a good idea, by the way. If you apply that regex to a string like "xfoxxxxxxxxxfox"
, you'll run into catastrophic backtracking.
The possessive quantifier takes the entire string and checks if it matches, if not it fails. In your case xfooxxxxxxfoo matches the .*+
but then you ask for another foo
, which isn't present, so the matcher fails.
The greedy quantifier first does the same, but instead of failing it "backs off" and tries again:
xfooxxxxxxfoo fail
xfooxxxxxxfo fail
xfooxxxxxxf fail
xfooxxxxxx match
In your second regex you ask for something else by confusing the grouping mechanism. You ask for "one or more matches of (.*)" as the + now relates to the () and there is one match.