strip_tags: strip off the messy tags and styles

2019-07-16 18:06发布

问题:

How can I strip off certain html tags and allow some of them?

For instance,

I want to strip off span tags but allow the span with underline.

<span style="text-decoration: underline;">Text</span>

I want to allow p but I want to remove any styles or classes inside the p for instance,

<p class="99light">Text</p> the class inside the p tag should be removed - I just want a clean p tag.

The is the line I have so far,

strip_tags($content, '<p><a><br><em><strong><ul><li>');

回答1:

You can't. You'll need to use an XML/HTML parser to do that:

// with DOMDocument it might look something like this.
$dom = new DOMDocument();
$dom->loadHTML( $content );
foreach( $dom->getElementsByTagName( "p" ) as $p )
{
    // removes all attributes from a p tag.
    /*
    foreach( $p->attributes as $attrib )
    {
        $p->removeAttributeNode( $attrib );
    }
    */
    // remove only the style attribute.
    $p->removeAttributeNode( $p->getAttributeNode( "style" ) );
}
echo $dom->saveHTML();


回答2:

You need full DOM parsing. strip_tags will not offer the necessary security and customization. I have used the HTMLPurifier library in the past for this. It does actual parsing and allows you to set whitelists while taking care of malicious inputs and producing valid markup!

By "necessary security" I mean that if you try to write a custom parser you will make a mistake (don't worry, I would too) and by "customization" I mean no built-in solution will let you target only certain tags with certain attributes and values of those attributes. HTMLPurifier is the PHP library solution.