How to strip specific tags and specific attributes

2020-03-26 03:00发布

Here's the deal, I'm making a project to help teach HTML to people. Naturally, I'm afraid of that Scumbag Steve (see figure 1).

So I wanted to block ALL HTML tags, except those approved on a very specific whitelist.

Out of those approved HTML tags, I want to remove harmful attributes as well. Such as onload and onmouseover. Also, according to a whitelist.

I've thought of regex, but I'm pretty sure it's evil and not very helpful for the job.

Could anyone give me a nudge in the right direction?

Thanks in advance.


Fig 1.

Scumbag Steve

3条回答
叛逆
2楼-- · 2020-03-26 03:44

For tags you can use strip_tags

For attributes, refer to How can I remove attributes from an html tag?

查看更多
Lonely孤独者°
3楼-- · 2020-03-26 03:46

Use Zend framework 2 strip tags. An example below to accept ul, li, p... and img (only with src attribute) and links (with only href atttribute). Everything else will be stripped. If I'm not wrong zf1 does the same thing

     $filter = new \Zend\Filter\StripTags(array(
        'allowTags'   => array(
            'ul'=>array(), 
            'li'=>array(), 
            'p'=>array(), 
            'br'=>array(), 
            'img'=>array('src'), 
            'a'=>array('href')
         ),
        'allowAttribs'  => array(),
        'allowComments' => false)
    );

    $value = $filter->filter($value);
查看更多
做自己的国王
4楼-- · 2020-03-26 03:52
require_once 'library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();

 // this one is needed cause otherwise stuff 
 // considered harmful like input's will automatically be deleted
$config->set('HTML.Trusted', true);

// this line say that only input, p, div will be accepted
$config->set('HTML.AllowedElements', 'input,p,div');

// set attributes for each tag
$config->set('HTML.AllowedAttributes', 'input.type,input.name,p.id,div.style');

// more extensive way of manage attribute and elements... see the docs
// http://htmlpurifier.org/live/configdoc/plain.html
$def = $config->getHTMLDefinition(true);

$def->addAttribute('input', 'type', 'Enum#text');
$def->addAttribute('input', 'name', 'Text');

// call...
$purifier = new HTMLPurifier($config);

// display...
$html = $purifier->purify($raw_html);
  • NOTE: as you asked this code will run as a Whitelist, only input, p and div are accepted and only certains attributes are accepted.
查看更多
登录 后发表回答