How do I match a pattern with optional surrounding

2020-02-27 23:54发布

问题:

How would one write a regex that matches a pattern that can contain quotes, but if it does, must have matching quotes at the beginning and end?

"?(pattern)"?

Will not work because it will allow patterns that begin with a quote but don't end with one.

"(pattern)"|(pattern)

Will work, but is repetitive. Is there a better way to do that without repeating the pattern?

回答1:

You can get a solution without repeating by making use of backreferences and conditionals:

/^(")?(pattern)(?(1)\1|)$/

Matches:

  • pattern
  • "pattern"

Doesn't match:

  • "pattern
  • pattern"

This pattern is somewhat complex, however. It first looks for an optional quote, and puts it into backreference 1 if one is found. Then it searches for your pattern. Then it uses conditional syntax to say "if backreference 1 is found again, match it, otherwise match nothing". The whole pattern is anchored (which means that it needs to appear by itself on a line) so that unmatched quotes won't be captured (otherwise the pattern in pattern" would match).

Note that support for conditionals varies by engine and the more verbose but repetitive expressions will be more widely supported (and likely easier to understand).


Update: A much simpler version of this regex would be /^(")?(pattern)\1$/, which does not need a conditional. When I was testing this initially, the tester I was using gave me a false negative, which lead me to discount it (oops!).

I'll leave the solution with the conditional up for posterity and interest, but this is a simpler version that is more likely to work in a wider variety of engines (backreferences are the only feature being used here which might be unsupported).



回答2:

Depending on the language you're using, you should be able to use backreferences. Something like this, say:

(["'])(pattern)\1|^(pattern)$

That way, you're requiring that either there are no quotes, or that the SAME quote is used on both ends.



回答3:

This should work with recursive regex (which needs longer to get right). In the meantime: in Perl, you can build a self-modifying regex. I'll leave that as an academic example ;-)

my @stuff = ( '"pattern"', 'pattern', 'pattern"', '"pattern'  );

foreach (@stuff) {
   print "$_ OK\n" if /^
                        (")?
                        \w+
                        (??{defined $1 ? '"' : ''})
                       $
                      /x
}

Result:

"pattern" OK
pattern OK