RegEx: Look-behind to avoid odd number of consecut

2020-02-10 04:43发布

问题:

I have user input where some tags are allowed inside square brackets. I've already wrote the regex pattern to find and validate what's inside the brackets.

In user input field opening-bracket could ([) be escaped with backslash, also backslash could be escaped with another backslash (\). I need look-behind sub-pattern to avoid odd number of consecutive backslashes before opening-bracket.

At the moment I must deal with something like this:

(?<!\\)(?:\\\\)*\[(?<inside brackets>.*?)]

It works fine, but problem is that this code still matches possible pairs of consecutive backslashes in front of brackets (even they are hidden) and look-behind just checks out if there's another single backslash appended to pairs (or directly to opening-bracket). I need to avoid them all inside look-behind group if possible.

Example:

my [test] string is ok
my \[test] string is wrong
my \\[test] string is ok
my \\\[test] string is wrong
my \\\\[test] string is ok
my \\\\\[test] string is wrong
...
etc

I work with PHP PCRE

回答1:

Last time I checked, PHP did not support variable-length lookbehinds. That is why you cannot use the trivial solution (?<![^\\](?:\\\\)*\\).

The simplest workaround would be to simply match the entire thing, not just the brackets part:

(?<!\\)((?:\\\\)*)\[(?<inside_brackets>.*?)]

The difference is that now, if you're using that regex in a preg_replace, you gotta remember to prefix the replacement string by $1, to restore the backslashes being there.



回答2:

You could do it without any look-behinds at all (the (\\\\|[^\\]) alternation eats anything but a single back-slash):

^(\\\\|[^\\])*\[(?<brackets>.*?)\]