How can I escape all code within tag

2019-07-10 12:37发布

问题:

What I want to do is to allow users to post code if they need to, so it is viewable and it doesn't render. For example:

<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>

Should be turned into:

&lt;span&gt;
&lt;div id="hkhsdfhu"&gt;&lt;/div&gt;
&lt;/span&gt;
&lt;h1&gt;Hello&lt;/h1&gt;

Only if it is wrapped in <code></code> tags. Right now I am using the following function to allow only certain HTML tags and escape any other tags:

function allowedHtml($str) {
$allowed_tags = array("b", "strong", "i", "em");
$sans_tags = str_replace(array("<", ">"), array("&lt;","&gt;"), $str);
$regex = sprintf("~&lt;(/)?(%s)&gt;~", implode("|",$allowed_tags));
$with_allowed = preg_replace($regex, "<\\1\\2>", $sans_tags);
return $with_allowed;
}

However, if a user wraps their code in <code></code> tags and it contains any of the allowed tags in my function above, those tags will render instead of being escaped. How can I make it where anything in <code></code> tags gets escaped (or just the < and > turned into &lt; and &gt;)? I know about htmlentities() but I don't want to do that to the whole post, only stuff inside <code></code> tags.

Thanks in advance!

回答1:

Just use a single preg_replace() function with the e modifier to execute an htmlenteties() function on everything it finds within <code> tags

EDITED

function allowedHtml($str) {
  $str = htmlentities($str, ENT_QUOTES, "UTF-8");
  $allowed_tags = array("b", "strong", "i", "em", "code");
  foreach ($allowed_tags as $tag) {
    $str = preg_replace("#&lt;" . $tag . "&gt;(.*?)&lt;/" . $tag . "&gt;#i", "<" . $tag . ">$1</" . $tag . ">", $str);
  }
  return $str;
}

$reply = allowedHtml($_POST['reply']);
$reply = preg_replace("#\<code\>(.+?)\</code\>#e", "'<code>'.htmlentities('$1', ENT_QUOTES, 'UTF-8').'</code>'", $reply);
$reply = str_replace("&amp;", "&", $reply);

Rewrote your allowedHtml() function and added a str_replace() at the end.

It's tested and should now work perfectly :)

UPDATED - NEW SOLUTION

function convertHtml($reply, $revert = false) {
  $specials = array("**", "*", "_", "-");
  $tags = array("b", "i", "u", "s");

  foreach ($tags as $key => $tag) {
    $open = "<" . $tag . ">";
    $close = "</" . $tag . ">";

    if ($revert == true) {
      $special = $specials[$key];
      $reply = preg_replace("#" . $open . "(.+?)" . $close . "#i", $special . "$1" . $special, $reply);
    }
    else {
      $special = str_replace("*", "\*", $specials[$key]);
      $reply = preg_replace("#" . $special . "(.+?)" . $special . "#i", $open . "$1" . $close, $reply);
    }
  }

  return $reply;
}

$reply = htmlentities($reply, ENT_QUOTES, "UTF-8");
$reply = convertHtml($reply);

$reply = preg_replace("#[^\S\r\n]{4}(.+?)(?!.+)#i", "<pre><code>$1</code></pre>", $reply);
$reply = preg_replace("#\</code\>\</pre\>(\s*)\<pre\>\<code\>#i", "$1", $reply);

$reply = nl2br($reply);
$reply = preg_replace("#\<pre\>\<code\>(.*?)\</code\>\</pre\>#se", "'<pre><code>'.convertHtml(str_replace('<br />', '', '$1'), true).'</code></pre>'", $reply);

Discussed another solution, and the above code will fix that. It works just like the Stack Overflow html conversion, which means that ** becomes bold, * becomes italic, _ becomes underlined and - is "strikethrough". On top of that, all lines starting with 4 or more spaces will be output as code



回答2:

I think you would be better off working directly with the dom rather than using regular expressions to parse out allowed tags. For example to traverse the dom and escape content in <code> tags, you could do something along the lines of:

$doc = new DOMDocument();
$doc->loadHTML($postHtml);
$codeNode = $doc->getElementsByTagName('code')->item(0);
$escapedCode = htmlspecialchars($codeNode->nodeValue);


回答3:

Here is a way you can do it with preg_replace(). Just make sure you call this function before you call your allowedHtml function so the tags are already replaced.

<?php

$post = <<<EOD
I am a person writing a post
How can I write this code?

Example:

<code>
<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>
</code>

Pls help me...
EOD;

$post = preg_replace('/<code>(.*?)<\/code>/ise',
                     "'<code>' . htmlspecialchars('$1') . '</code>'",
                      $post);

var_dump($post);

Result:

string(201) "I am a person writing a post
How can I write this code?

Example:

<code>
&lt;span&gt;
&lt;div id=\&quot;hkhsdfhu\&quot;&gt;&lt;/div&gt;
&lt;/span&gt;
&lt;h1&gt;Hello&lt;/h1&gt;
</code>

Pls help me..."


回答4:

Here's one.

$str = preg_replace_callback('/(?<=<code>)(.*?)(?=<\/code>)/si','escape_code',$str);

function escape_code($matches) {

    $tags = array('b','strong','i','em');
    // declare the tags in this array

    $allowed = implode('|',$tags);
    $match = htmlentities($matches[0],ENT_NOQUOTES,'UTF-8');
    return preg_replace('~&lt;(/)?('.$allowed.')(\s*/)?&gt;~i','<$1$2$3>',$match);
}


标签: php escaping