Regex Match PHP Comment

2020-08-04 10:57发布

问题:

Ive been trying to match PHP comments using regex.

//([^<]+)\r\n

Thats what ive got but it doesn't really work.

Ive also tried

//([^<]+)\r
//([^<]+)\n
//([^<]+)

...to no avail

回答1:

In what program are you coding this regex? Your final example is a good sanity check if you're worried that the newline chars aren't working. (I have no idea why you don't allow less-than in your comment; I'm assuming that's specific to your application.)

Try

//[^<]+

and see if that works. As Draemon says, you might have to escape the diagonals. You might also have to escape the parentheses. I can't tell if you know this, but parentheses are often used to enclose capturing groups. Finally, check whether there is indeed at least one character after the double slashes.



回答2:

To match comments, you have to think there are two types of comments in PHP 5 :

  • comments which start by // and go to the end of the line
  • comments that start by /* and go to */

Considering you have these two lines first :

$filePath = '/home/squale/developpement/astralblog/website/library/HTMLPurifier.php';
$str = file_get_contents($filePath);

You could match the first ones with :

$matches_slashslash = array();
if (preg_match_all('#//(.*)$#m', $str, $matches_slashslash)) {
    var_dump($matches_slashslash[1]);
}

And the second ones with :

$matches_slashstar = array();
if (preg_match_all('#/\*(.*?)\*/#sm', $str, $matches_slashstar)) {
    var_dump($matches_slashstar[1]);
}

But you will probably get into troubles with '//' in the middle of string (what about heredoc syntax, btw, did you think about that one ? ), or "toggle comments" like this :

/*
echo 'a';
/*/
echo 'b';
//*/

(Just add a slash at the begining to "toggle" the two blocks, if you don't know the trick)

So... Quite hard to detect comments with only regex...


Another way would be to use the PHP Tokenizer, which, obviously, knows how to parse PHP code and comments.

For references, see :

  • token_get_all
  • List of Parser Tokens

With that, you would have to use the tokenizer on your string of PHP code, iterate on all the tokens you get as a result, and detect which ones are comments.

Something like this would probably do :

$tokens = token_get_all($str);

foreach ($tokens as $token) {
    if ($token[0] == T_COMMENT
        || $token[0] == T_DOC_COMMENT) {
        // This is a comment ;-)
        var_dump($token);
    }
}

And, as output, you'll get a list of stuff like this :

array
  0 => int 366
  1 => string '/** Version of HTML Purifier */' (length=31)
  2 => int 57

or this :

array
  0 => int 365
  1 => string '// :TODO: make the config merge in, instead of replace
' (length=55)
  2 => int 117

(You "just" might to strip the // and /* */, but that's up to you ; at least, you have extracted the comments ^^ )

If you really want to detect comments without any kind of strange error due to "strange" syntax, I suppose this would be the way to go ;-)



回答3:

You probably need to escape the "//":

\/\/([^<]+)


回答4:

This will match comments in PHP (both /* */ and // format)

/(\/\*).*?(\*\/)|(\/\/).*?(\n)/s

To get all matches, use preg_match_all to get array of matches.