I need to match @anything_here@
from a string @anything_here@dhhhd@shdjhjs@
. So I'd used following regex.
^@.*?@
or
^@[^@]*@
Both way it's work but I would like to know which one would be a better solution. Regex with non-greedy repetition or regex with negated character class?
Negated character classes should usually be prefered over lazy matching, if possible.
If the regex is successful, ^@[^@]*@
can match the content between @
s in a single step, while ^@.*?@
needs to expand for each character between @
s.
When failing (for the case of no ending @
) most regex engines will apply a little magic and internally treat [^@]*
as [^@]*+
, as there is a clear cut border between @
and non-@
, thus it will match to the end of the string, recognize the missing @
and not backtrack, but instantly fail. .*?
will expand character for character as usual.
When used in larger contexts, [^@]*
will also never expand over the borders of the ending @
while this is very well possible for the lazy matching. E.g. ^@[^@]*a[^@]*@
won't match @bbbb@a@
while ^@.*?a.*?@
will.
Note that [^@]
will also match newlines, while .
doesn't (in most regex engines and unless used in singleline mode). You can avoid this by adding the newline character to the negation - if it is not wanted.
It is clear the ^@[^@]*@
option is much better.
The negated character class is quantified greedily which means the regex engine grabs 0 or more chars other than @
right away, as many as possible. See this regex demo and matching:
When you use a lazy dot matching pattern, the engine matches @
, then tries to match the trailing @
(skipping the .*?
). It does not find the @
at Index 1, so the .*?
matches the a
char. This .*?
pattern expands as many times as there are chars other than @
up to the first @
.
See the lazy dot matching based pattern demo here and here is the matching steps: