How can I strip off certain html tags and allow some of them?
For instance,
I want to strip off span
tags but allow the span
with underline.
<span style="text-decoration: underline;">Text</span>
I want to allow p
but I want to remove any styles or classes inside the p
for instance,
<p class="99light">Text</p>
the class inside the p tag should be removed - I just want a clean p
tag.
The is the line I have so far,
strip_tags($content, '<p><a><br><em><strong><ul><li>');
You can't. You'll need to use an XML/HTML parser to do that:
// with DOMDocument it might look something like this.
$dom = new DOMDocument();
$dom->loadHTML( $content );
foreach( $dom->getElementsByTagName( "p" ) as $p )
{
// removes all attributes from a p tag.
/*
foreach( $p->attributes as $attrib )
{
$p->removeAttributeNode( $attrib );
}
*/
// remove only the style attribute.
$p->removeAttributeNode( $p->getAttributeNode( "style" ) );
}
echo $dom->saveHTML();
You need full DOM parsing. strip_tags
will not offer the necessary security and customization. I have used the HTMLPurifier library in the past for this. It does actual parsing and allows you to set whitelists while taking care of malicious inputs and producing valid markup!
By "necessary security" I mean that if you try to write a custom parser you will make a mistake (don't worry, I would too) and by "customization" I mean no built-in solution will let you target only certain tags with certain attributes and values of those attributes. HTMLPurifier is the PHP library solution.