Can regular expressions be used to match nested pa

2018-12-30 23:36发布

Is it possible to write a regular expression that matches a nested pattern that occurs an unknown number of times? For example, can a regular expression match an opening and closing brace when there are an unknown number of open/close braces nested within the outer braces?

For example:

public MyMethod()
{
  if (test)
  {
    // More { }
  }

  // More { }
} // End

Should match:

{
  if (test)
  {
    // More { }
  }

  // More { }
}

13条回答
余生请多指教
2楼-- · 2018-12-31 00:12

Using the recursive matching in the PHP regex engine is massively faster than procedural matching of brackets. especially with longer strings.

http://php.net/manual/en/regexp.reference.recursive.php

e.g.

$patt = '!\( (?: (?: (?>[^()]+) | (?R) )* ) \)!x';

preg_match_all( $patt, $str, $m );

vs.

matchBrackets( $str );

function matchBrackets ( $str, $offset = 0 ) {

    $matches = array();

    list( $opener, $closer ) = array( '(', ')' );

    // Return early if there's no match
    if ( false === ( $first_offset = strpos( $str, $opener, $offset ) ) ) {
        return $matches;
    }

    // Step through the string one character at a time storing offsets
    $paren_score = -1;
    $inside_paren = false;
    $match_start = 0;
    $offsets = array();

    for ( $index = $first_offset; $index < strlen( $str ); $index++ ) {
        $char = $str[ $index ];

        if ( $opener === $char ) {
            if ( ! $inside_paren ) {
                $paren_score = 1;
                $match_start = $index;
            }
            else {
                $paren_score++;
            }
            $inside_paren = true;
        }
        elseif ( $closer === $char ) {
            $paren_score--;
        }

        if ( 0 === $paren_score ) {
            $inside_paren = false;
            $paren_score = -1;
            $offsets[] = array( $match_start, $index + 1 );
        }
    }

    while ( $offset = array_shift( $offsets ) ) {

        list( $start, $finish ) = $offset;

        $match = substr( $str, $start, $finish - $start );
        $matches[] = $match;
    }

    return $matches;
}
查看更多
回忆,回不去的记忆
3楼-- · 2018-12-31 00:12

No, you are getting into the realm of Context Free Grammars at that point.

查看更多
回忆,回不去的记忆
4楼-- · 2018-12-31 00:13

Using regular expressions to check for nested patterns is very easy.

'/(\((?>[^()]+|(?1))*\))/'
查看更多
何处买醉
5楼-- · 2018-12-31 00:14

This seems to work: /(\{(?:\{.*\}|[^\{])*\})/m

查看更多
弹指情弦暗扣
6楼-- · 2018-12-31 00:16

The Pumping lemma for regular languages is the reason why you can't do that.

The generated automaton will have a finite number of states, say k, so a string of k+1 opening braces is bound to have a state repeated somewhere (as the automaton processes the characters). The part of the string between the same state can be duplicated infinitely many times and the automaton will not know the difference.

In particular, if it accepts k+1 opening braces followed by k+1 closing braces (which it should) it will also accept the pumped number of opening braces followed by unchanged k+1 closing brases (which it shouldn't).

查看更多
琉璃瓶的回忆
7楼-- · 2018-12-31 00:20

No. You need a full-blown parser for this type of problem.

查看更多
登录 后发表回答