Allowing full html to be parsed in HTMLPurifier

2019-05-22 17:37发布

问题:

This is a problem I've had for a long time - I currently accept a full html page from the user as input and want to filter / clean it out. the problem with HTMLpurifier is that it removes the head , html , and body tags - as well as the styles in the head. I've google , looked at the forums , tried implementing what was written , and to no luck. Can someone help ?

What I want : To keep the HTML , HEAD , STYLE , BODY TAGS

What I have done :

$config->set('HTML.DefinitionID', 'test');
    $config->set('HTML.DefinitionRev', 1);
    $config->set('HTML.AllowedElements', array('html','head', 'body', 'style', 'div', 'p'));    

    if ($def = $config->maybeGetRawHTMLDefinition()) {
        $def->addElement('html', 'Block', 'Inline', 'Common', array());
        $def->addElement('head', 'Block', 'Inline', 'Common', array());
        $def->addElement('style', 'Block', 'Inline', 'Common', array());
        $def->addElement('body', 'Block', 'Inline', 'Common', array());

    }

回答1:

Why not use strip_tags? It supports list of allowed tags.

http://www.php.net/manual/en/function.strip-tags.php



回答2:

You need to

$config->set('Core.ConvertDocumentToFragment', false);

For whatever reason, Core.ConvertDocumentToFragment defaults to true, even though the documentation states that "for most inputs, this processing is not necessary".

I was bitten by this too. All I got from the error collector was the cryptic message "Removed document metadata tags", which in turn is a translation from the internal message "Lexer: Extracted body".



回答3:

End Result - HTMLPurfier does not natively allow full HTML Parsing - Either extend it or find a pass thru