i wonder what is the problem with the backreference here:
preg_match_all('/__\((\'|")([^\1]+)\1/', "__('match this') . 'not this'", $matches);
it is expected to match the string between __('') but actually it returns:
match this') . 'not this
any ideas?
Make your regex ungreedy:
You can use something like:
/__\(("[^"]+"|'[^']+')\)/
I'm suprised it didn't give you an unbalance parenthesis error message.
This
[^\1]
will not take the contents of capture buffer 1 and put it into a characterclass. It is the same as all characters that are NOT '1'.
Try this:
/__\(('|").*?\1\).*/
You can add an inner capturing parenthesis to just capture whats between quotes:
/__\(('|")(.*?)\1\).*/
Edit: If no inner delimeter is allowed, use Qtax regex.
Since,
('|").*?\1
even though non-greedy, will still match all up to the trailing anchor. In this case__('all'this'will"match')
, and its better to use('[^']*'|"[^"]*)
asYou can't use a backreference inside a character class because a character class matches exactly one character, and a backreference can potentially match any number of characters, or none.
What you're trying to do requires a negative lookahead, not a negated character class:
I also changed your alternation -
\'|"
- to a character class -[\'"]
- because it's much more efficient, and I escaped the outer parentheses to make them match literal parentheses.EDIT: I guess I need to expand that "more efficient" remark. I took the example Friedl used to demonstrate this point and tested it in RegexBuddy.
Applied to target text
abababdedfg
,^[a-g]+$
reports success after three steps, while^(?:a|b|c|d|e|f|g)+$
takes 55 steps.And that's for a successful match. When I try it on
abababdedfz
,^[a-g]+$
reports failure after 21 steps;^(?:a|b|c|d|e|f|g)+$
takes 99 steps.In this particular case the impact on performance is so trivial it's not even worth mentioning. I'm just saying whenever you find yourself choosing between a character class and an alternation that both match the same things, you should almost always go with the character class. Just a rule of thumb.