What I want to do is to allow users to post code if they need to, so it is viewable and it doesn't render. For example:
<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>
Should be turned into:
<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>
Only if it is wrapped in <code></code>
tags. Right now I am using the following function to allow only certain HTML tags and escape any other tags:
function allowedHtml($str) {
$allowed_tags = array("b", "strong", "i", "em");
$sans_tags = str_replace(array("<", ">"), array("<",">"), $str);
$regex = sprintf("~<(/)?(%s)>~", implode("|",$allowed_tags));
$with_allowed = preg_replace($regex, "<\\1\\2>", $sans_tags);
return $with_allowed;
}
However, if a user wraps their code in <code></code>
tags and it contains any of the allowed tags in my function above, those tags will render instead of being escaped. How can I make it where anything in <code></code>
tags gets escaped (or just the <
and >
turned into <
and >
)? I know about htmlentities()
but I don't want to do that to the whole post, only stuff inside <code></code>
tags.
Thanks in advance!
Just use a single preg_replace()
function with the e modifier to execute an htmlenteties()
function on everything it finds within <code>
tags
EDITED
function allowedHtml($str) {
$str = htmlentities($str, ENT_QUOTES, "UTF-8");
$allowed_tags = array("b", "strong", "i", "em", "code");
foreach ($allowed_tags as $tag) {
$str = preg_replace("#<" . $tag . ">(.*?)</" . $tag . ">#i", "<" . $tag . ">$1</" . $tag . ">", $str);
}
return $str;
}
$reply = allowedHtml($_POST['reply']);
$reply = preg_replace("#\<code\>(.+?)\</code\>#e", "'<code>'.htmlentities('$1', ENT_QUOTES, 'UTF-8').'</code>'", $reply);
$reply = str_replace("&", "&", $reply);
Rewrote your allowedHtml()
function and added a str_replace()
at the end.
It's tested and should now work perfectly :)
UPDATED - NEW SOLUTION
function convertHtml($reply, $revert = false) {
$specials = array("**", "*", "_", "-");
$tags = array("b", "i", "u", "s");
foreach ($tags as $key => $tag) {
$open = "<" . $tag . ">";
$close = "</" . $tag . ">";
if ($revert == true) {
$special = $specials[$key];
$reply = preg_replace("#" . $open . "(.+?)" . $close . "#i", $special . "$1" . $special, $reply);
}
else {
$special = str_replace("*", "\*", $specials[$key]);
$reply = preg_replace("#" . $special . "(.+?)" . $special . "#i", $open . "$1" . $close, $reply);
}
}
return $reply;
}
$reply = htmlentities($reply, ENT_QUOTES, "UTF-8");
$reply = convertHtml($reply);
$reply = preg_replace("#[^\S\r\n]{4}(.+?)(?!.+)#i", "<pre><code>$1</code></pre>", $reply);
$reply = preg_replace("#\</code\>\</pre\>(\s*)\<pre\>\<code\>#i", "$1", $reply);
$reply = nl2br($reply);
$reply = preg_replace("#\<pre\>\<code\>(.*?)\</code\>\</pre\>#se", "'<pre><code>'.convertHtml(str_replace('<br />', '', '$1'), true).'</code></pre>'", $reply);
Discussed another solution, and the above code will fix that. It works just like the Stack Overflow html conversion, which means that ** becomes bold, * becomes italic, _ becomes underlined and - is "strikethrough". On top of that, all lines starting with 4 or more spaces will be output as code
I think you would be better off working directly with the dom rather than using regular expressions to parse out allowed tags. For example to traverse the dom and escape content in <code>
tags, you could do something along the lines of:
$doc = new DOMDocument();
$doc->loadHTML($postHtml);
$codeNode = $doc->getElementsByTagName('code')->item(0);
$escapedCode = htmlspecialchars($codeNode->nodeValue);
Here is a way you can do it with preg_replace(). Just make sure you call this function before you call your allowedHtml
function so the tags are already replaced.
<?php
$post = <<<EOD
I am a person writing a post
How can I write this code?
Example:
<code>
<span>
<div id="hkhsdfhu"></div>
</span>
<h1>Hello</h1>
</code>
Pls help me...
EOD;
$post = preg_replace('/<code>(.*?)<\/code>/ise',
"'<code>' . htmlspecialchars('$1') . '</code>'",
$post);
var_dump($post);
Result:
string(201) "I am a person writing a post
How can I write this code?
Example:
<code>
<span>
<div id=\"hkhsdfhu\"></div>
</span>
<h1>Hello</h1>
</code>
Pls help me..."
Here's one.
$str = preg_replace_callback('/(?<=<code>)(.*?)(?=<\/code>)/si','escape_code',$str);
function escape_code($matches) {
$tags = array('b','strong','i','em');
// declare the tags in this array
$allowed = implode('|',$tags);
$match = htmlentities($matches[0],ENT_NOQUOTES,'UTF-8');
return preg_replace('~<(/)?('.$allowed.')(\s*/)?>~i','<$1$2$3>',$match);
}