-->

How to strip specific tags and specific attributes

2020-03-26 03:37发布

问题:

Here's the deal, I'm making a project to help teach HTML to people. Naturally, I'm afraid of that Scumbag Steve (see figure 1).

So I wanted to block ALL HTML tags, except those approved on a very specific whitelist.

Out of those approved HTML tags, I want to remove harmful attributes as well. Such as onload and onmouseover. Also, according to a whitelist.

I've thought of regex, but I'm pretty sure it's evil and not very helpful for the job.

Could anyone give me a nudge in the right direction?

Thanks in advance.


Fig 1.

回答1:

  • demo: http://so.devilmaycode.it/how-to-strip-specific-tags-and-specific-attributes-from-a-string/
require_once 'library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();

 // this one is needed cause otherwise stuff 
 // considered harmful like input's will automatically be deleted
$config->set('HTML.Trusted', true);

// this line say that only input, p, div will be accepted
$config->set('HTML.AllowedElements', 'input,p,div');

// set attributes for each tag
$config->set('HTML.AllowedAttributes', 'input.type,input.name,p.id,div.style');

// more extensive way of manage attribute and elements... see the docs
// http://htmlpurifier.org/live/configdoc/plain.html
$def = $config->getHTMLDefinition(true);

$def->addAttribute('input', 'type', 'Enum#text');
$def->addAttribute('input', 'name', 'Text');

// call...
$purifier = new HTMLPurifier($config);

// display...
$html = $purifier->purify($raw_html);
  • NOTE: as you asked this code will run as a Whitelist, only input, p and div are accepted and only certains attributes are accepted.


回答2:

Use Zend framework 2 strip tags. An example below to accept ul, li, p... and img (only with src attribute) and links (with only href atttribute). Everything else will be stripped. If I'm not wrong zf1 does the same thing

     $filter = new \Zend\Filter\StripTags(array(
        'allowTags'   => array(
            'ul'=>array(), 
            'li'=>array(), 
            'p'=>array(), 
            'br'=>array(), 
            'img'=>array('src'), 
            'a'=>array('href')
         ),
        'allowAttribs'  => array(),
        'allowComments' => false)
    );

    $value = $filter->filter($value);


回答3:

For tags you can use strip_tags

For attributes, refer to How can I remove attributes from an html tag?