preg_replace within the preg_replace

2020-05-01 08:54发布

Right now I'm having issues replacing strings that already come out from preg_match. Lets say I have bbcode of [b]bla[/b], I have this part working with replacing [b] with <b>, but lets just say for all testing purposes that they did [b]hi [b]test[/b][/b], what ends up coming out is "hi [b]test[/b]", with everything being bolded, but the [b] won't get replaced for some reason.

Currently this is my expression: /\[b\](.*)\[\/b\]/

Sorry, I didn't show my code, I'm new to this.

// Will convert string data into readable data
function ConvertStringData2ReadableData($UglyString) {

$CheckArrays = [
"QUOTE" => "/\[quote=?(.*)\](.*)\[\/quote\]/",
"BOLD" => "/\[b\](.*)\[\/b\]/",
"ITALIC" => "/\[i\](.*)\[\/i\]/",
];

$FanceString = $UglyString;

// QUOTES
do {
    $FanceString = preg_replace_callback(
        $CheckArrays['QUOTE'],
        function($match) {
            if (is_numeric($match[1])) {
                $TPID = GetThreadPoster($match[1]);
                $TPUN = GetUsernameS($TPID);
                $statement = ('<div class="panel panel-default"><div class="panel-heading">'.$match[2].'<br>- <b>'.$TPUN.'</b></div></div>');
            } elseif (!is_numeric($match[1])) {
                $statement = ('<div class="panel panel-default"><div class="panel-heading">'.$match[2].'</div></div>');
            }
            return $statement;
        },
        $FanceString,
        -1,
        $count
    );
} while ($count > 0);

// BOLD
do {
    $FanceString = preg_replace($CheckArrays['BOLD'] , "<b>$1</b>" , $FanceString, -1, $count);
} while ($count > 0);
#$FanceString = preg_replace($CheckArrays['BOLD'] , "<b>$1</b>" , $FanceString, -1);

// ITALIC
do {
    $FanceString = preg_replace($CheckArrays['ITALIC'] , "<i style='all: unset; font-style: italic;'>$1</i>" , $FanceString, -1, $count);
} while ($count > 0);

return($FanceString);

}

2条回答
Bombasti
2楼-- · 2020-05-01 09:32

Because you are never going to be able to fully trust user data AND because bbcode is just as vulnerable as html to incorrect parsing by regex, you will never be 100% confident that this method will work. Non-quote tags can just as easily be replaced by a non-regex method, so I am eliminating the pattern convolution by segmenting the logic.

I am implementing a recursive pattern for quote tags (assuming everything will be balanced) and using your do-while() technique -- I think this is the best approach. This will effectively work from outer quote inward on each iteration (while $count is positive).

Code: (Demo)

function bbcodequote2html($matches){
    $text=(isset($matches[2])?$matches[2]:'');  // avoid Notices
    if(isset($matches[1]) && ctype_digit($matches[1])){
        $TPID = "#{$matches[1]}"; // GetThreadPoster($match[1]);
        $TPUN = "#{$matches[1]}"; // GetUsernameS($TPID);
        $quotee="<br>- <b>$TPUN</b>";
    }else{
        $quotee='';  // no id value or id is non-numeric default to empty string
    }
    return "<div class=\"panel panel-default\"><div class=\"panel-heading\">$text$quotee</div></div>";
}

$bbcode=<<<BBCODE
[quote=2]Outer Quote[b]bold [b]nested bold[/b][/b]
[i]italic [i]nested italic[/i][/i][quote]Inner Quote 1: (no id)[/quote]
[quote=bitethatapple]Inner Quote 2[quote=1]Inner Quote 3[/quote] still inner quote 2 [quote=mickmackusa]Inner Quote 4[/quote] end of inner quote 2[/quote][/quote]
BBCODE;

$converted=str_replace(
    ['[b]','[/b]','[i]','[/i]'],
    ['<b>','</b>','<i style=\"all:unset;font-style:italic;\">','</i>'],
    $bbcode
);

$tabs="\t";
do{
    $converted=preg_replace_callback('~\[quote(?:=(.+?))?]((?:(?R)|.*?)+)\[/quote]~is','bbcodequote2html',$converted,-1,$count);
}while($count);

echo $converted;

It is difficult for me to display the output in a fashion that is easy to read. You may be best served to run my code on your server and check that the results render as desired.

Output:

<div class="panel panel-default"><div class="panel-heading">Outer Quote<b>bold <b>nested bold</b></b>
<i style=\"all:unset;font-style:italic;\">italic <i style=\"all:unset;font-style:italic;\">nested italic</i></i><div class="panel panel-default"><div class="panel-heading">Inner Quote 1: (no id)</div></div>
<div class="panel panel-default"><div class="panel-heading">Inner Quote 2<div class="panel panel-default"><div class="panel-heading">Inner Quote 3<br>- <b>#1</b></div></div> still inner quote 2 <div class="panel panel-default"><div class="panel-heading">Inner Quote 4</div></div> end of inner quote 2</div></div><br>- <b>#2</b></div></div>
查看更多
干净又极端
3楼-- · 2020-05-01 09:49

You could do something like this:

$string = '[b]hi [b]test[/b][/b]';    
do {
    $string = preg_replace('/\[b\](.*)\[\/b\]/', '<b>$1</b>', $string, -1, $count);
} while ($count > 0);

Or just use @Justinas' idea (from your OT's comment) if it's OK to replace all [b] with <b> and [/b] with </b> (regardless of them being in the right order/as pairs).

Edit: you also need to change your quote regex to this:

/\[quote(?:=(\d+))?\](.*)\[\/quote\]/s

s flag allows . to match newlines (you probably want to add it to the other ones too). I also fixed the quote ID capturing part.

查看更多
登录 后发表回答