Recursive regex not matching template blocks

2019-07-20 15:45发布

问题:

I'm trying to understand more about regex and in this case the recursion you can do in a regular expression.

I'm trying to match a nested block of {foreach $VAR} ... {/foreach}. But for some reason my regex isn't matching and i don't understand why.

I was hoping anyone here could shed some light on it. I'm not interested in a quick regex fix. But really more as why my regex isn't doing what i expect. What exactly is happening?

This is the code i have:

<?php
$str = 'start of text
{foreach $ABC}
  in 1st loop
  {foreach $XYZ}
    in 2nd loop
  {/foreach}
{/foreach}
some other stuff';

if ( preg_match ( '#{foreach \$.*?}((?!foreach)|(?R))*{/foreach}#', $str, $matches ) )
{
    print_r($matches);
}
else
{
    echo 'No match';
}

Here is the breakdown of my regex as how i think it's wokring:

{foreach \$     #match literally "{foreach $"
.*?}            #followed by any character ending with a '}'
(               # start a group
  (?!foreach)   # match any character, aslong as it's not the sequence 'foreach'
  |             # otherwise
  (?R)          # do a recursion
)               # end of group
*               # match 0 or more times with a backtrace...
{/foreach}      # ...backtracing until you find the last {/foreach}

That's how i think the regex works. But obviously that is not the case. So my question is, where am i wrong in my explanation?

You can play around with this code here: http://codepad.viper-7.com/508V9w


Just to clarify.

I'm trying to get the contents of each foreach block. So in my case:

arr[0] => in 1st loop
      {foreach $XYZ}
        in 2nd loop
      {/foreach}
arr[1] => in 2nd loop

OR -

arr[0] => {foreach $ABC}
      in 1st loop
      {foreach $XYZ}
        in 2nd loop
      {/foreach}
    {/foreach}
arr[1] => {foreach $XYZ}
        in 2nd loop
      {/foreach}

Either will do fine.

回答1:

First of all, the . does match anything but newlines per default. To have it match newlines as well, you have to set the modifier s.

And secondly, you are using assertions here: ((?!foreach)|(?R))*, but no actual characters to match. You need at least a dot before the * quantifier or something.

#{foreach \$.*?}((?!foreach)|(?R)).*{/foreach}#s gives the following result with your test text:

Array
(
    [0] => {foreach $ABC}
  in 1st loop
  {foreach $XYZ}
    in 2nd loop
  {/foreach}
{/foreach}
    [1] => 
)


标签: php regex pcre