There was a question about regex and trying to answer I found another strange things.
String x = "X";
System.out.println(x.replaceAll("X*", "Y"));
This prints YY. why??
String x = "X";
System.out.println(x.replaceAll("X*?", "Y"));
And this prints YXY
Why reluctant regex doesn't match 'X' character? There is "noting"X"nothing"
but why first doesn't match three symbols and matches two and then one instead of three? and second regex matches only "nothing"
s and not X
?
Let's consider them in turn:
There are two matches:
X
is matched, and is replaced withY
.Y
gets added to the output.End result:
YY
.There are also two matches:
Y
gets added to the output. The character at this position,X
, was not consumed by the match, and is therefore copied into the output verbatim.Y
gets added to the output.End result:
YXY
.The * is a tricky 'quantifier' since it means '0 or more'. Thus, it also matches '0 times X' (i.e. an empty string).
I would use
which has the expected behaviour.
In your first example you are using a "Greedy" quantifier. This means that the input string is forced to be read entirely before attempting the first match, so the first match tried is the whole input. If the input matches, the matcher goes past the input string and performs the zero-length match at the end of the string hence the two matches you see. The greedy matcher never backs-off to the zero-length match before the character X before the first match attempt was successful.
On the second example you are using a "Reluctant" quantifier which does the opposite of "Greedy". It starts at the beginning and tries to match one character at the time going forward (if it has to). So the zero-length match before the "X" character is matched, matcher moves forward by one (that's why you still see the "X" character in the output) where the next match is now the zero-length match after the "X".
There is a good tutorial here: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html