Why is my PHP regex that parses Markdown links bro

2020-02-26 12:28发布

问题:

$pattern = "/\[(.*?)\]\((.*?)\)/i";
$replace = "<a href=\"$2\" rel=\"nofollow\">$1</a>";
$text = "blah blah [LINK1](http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?";
echo preg_replace($pattern, $replace, $text);

The above works but if a space is accidently inserted between [] and () everything breaks and the two link are mixed into one:

$text = "blah blah [LINK1] (http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?";

I have a feeling it's the laxy star that breaks it but don't know how else to match repeating links.

回答1:

If I understand you right, all you need to do really is also match any number of spaces between the two as well, for example:

/\[([^]]*)\] *\(([^)]*)\)/i

Explanation:

\[             # Matches the opening square bracket (escaped)
([^]]*)        # Captures any number of characters that aren't close square brackets
\]             # Match close square bracket (escaped)
 *             # Match any number of spaces
\(             # Match the opening bracket (escaped)
([^)]*)        # Captures any number of characters that aren't close brackets
\)             # Match the close bracket (escaped)

Justification:

I should probably justify that the reason I changed your .*? into [^]]*

The second version is more efficient because it doesn't need to do a huge amount of backtracking that .*? does. Additionally, once an opening [ is encountered, the .*? version will carry on looking until it finds a match, rather than failing if it is not a tag as we would want. For example, if we match the expression using .*? against:

Sad face :[ blah [LINK1](http://sub.example.com/) blah

it will match

[ blah [LINK1]

and

http://sub.example.com/

Using the [^]]* approach will mean that the input is matched correctly.



回答2:

Try this:

$pattern = "/\[(.*?)\]\s?\((.*?)\)/i";

\s? added between \[(.*?)\] and \((.*?)\)