-->

Find PHP with REGEX

2019-08-19 01:12发布

问题:

I need a REGEX that can find blocks of PHP code in a file. For example:

    <? print '<?xml version="1.0" encoding="UTF-8"?>';?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <?php echo "stuff"; ?>
    </head>
    <html>

When parsed would by the REGEX would return:

array(
    "<? print '<?xml version=\"1.0\" encoding="UTF-8"?>';?>",
    "<? echo \"stuff\"; ?>"
);

You can assume the PHP is valid.

回答1:

With token_get_all you get a list of PHP language tokens of a given PHP code. Then you just need to iterate the list, look for the open tag tokens and for the corresponding close tags.

$blocks = array();
$opened = false;
foreach (token_get_all($code) as $token) {
    if (!$opened) {
        if (is_array($token) && ($token[0] === T_OPEN_TAG || $token[0] === T_OPEN_TAG_WITH_ECHO)) {
            $opened = true;
            $buffer = $token[1];
        }
    } else {
        if (is_array($token)) {
            $buffer .= $token[1];
            if ($token[0] === T_CLOSE_TAG) {
                $opened = false;
                $blocks[] = $buffer;
            }
        } else {
            $buffer .= $token;
        }
    }
}


回答2:

This is the type of task that is much better suited for a custom parser. You could relatively easily construct one using a stack and I can guarantee you will be done much quicker and pull less hair out than you would trying to debug your regex.

Regular expressions are great tools when used appropriately but not all text parsing tasks are equal.



回答3:

Try the following regex using preg_match()

/<\?(?:php)?\s+(.*?)\?>/

That's untested, but is a start. It assumes a closing PHP tag (arguably well-formed).



回答4:

Try this regex(untested):

preg_match_all('@<\?.*?\?>@si',$html,$m);
print_r($m[0]);


回答5:

<\?(?:php)?\s+.*?\?>$

with the following modifiers:

Dot match newlines

^& match at line breaks