Conditional regex in PHP doesn't seem to work

2019-07-19 09:18发布

问题:

Performing a regular expression match in PHP using the preg suite, I understand that you can represent a conditional statement right within the regex.

I could hardly find any documentation online so I turned to Jeffrey E.F. Friedl's Mastering Regular Expressions.

The way I see it, something like /(?(?<=NUM:)\d+|\w+)/ should match a digit when it is preceded by NUM: otherwise it should match a word.

But for some weird reason it always returns true and the match data doesn't make sense to me either. Can someone explain to me what's going on?

What I want to do is this:

preg_replace('/creat(?:e|ing)/i', 'make', $input)
but only when '/creat(?:e|ing)/i' is not surrounded by quotes.

in action, the input-output sequence I need is:

  1. input: create a white shirt.

output: make a white shirt.

  1. input: "create a white shirt."

output: "create a white shirt"

  1. input: hello create some good code.

output: hello make some good code.

  1. input: "hello" "make some" good "code."

output: "hello" "make some" good "code."

Thank you everybody!

Edit: I want to do something like: if there is an opening quote, make sure it has a closing pair before matching the keyword create in this case. Hope that makes sense and is possible.

回答1:

You do not need any conditional constructs to skip what is inside quotes. There are two ways.

Use an alternative branch matching a quoted substring and use (*SKIP)(*FAIL) verbs:

 preg_replace('/"[^"]*"(*SKIP)(*F)|creat(?:e|ing)/i', 'make', $input)

Pattern details:

  • "[^"]*" - matches ", then 0+ characters other than " and then a "
  • (*SKIP)(*F) - make the regex engine discard the currently matched text and proceed from the current index
  • | - or...
  • creat(?:e|ing) - match create or creating.

See demo

Another way is mere using capturing and using preg_replace_callback where you can check if a group was matched (and base the replacement logic appropriately):

 preg_replace_callback('/("[^"]*")|creat(?:e|ing)/i', function($m) {
     return !empty($m[1]) ? $m[1] : 'make';
 }, $input)

See the IDEONE demo

Pattern details:

  • ("[^"]*") - Group 1 (can be later referenced with $1 from the replacement pattern) - a double quoted string
  • | - or
  • creat(?:e|ing) - match create or creating.

Note that "[^"]*" is a sample regex, if you need to match C strings with escaped sequences, you should use at least "[^"\\\\]*(?:\\\\.[^"\\\\]*)*" (in the code).