Here's the deal, I'm making a project to help teach HTML to people. Naturally, I'm afraid of that Scumbag Steve (see figure 1).
So I wanted to block ALL HTML tags, except those approved on a very specific whitelist.
Out of those approved HTML tags, I want to remove harmful attributes as well. Such as onload
and onmouseover
. Also, according to a whitelist.
I've thought of regex, but I'm pretty sure it's evil and not very helpful for the job.
Could anyone give me a nudge in the right direction?
Thanks in advance.
Fig 1.
- demo: http://so.devilmaycode.it/how-to-strip-specific-tags-and-specific-attributes-from-a-string/
require_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
// this one is needed cause otherwise stuff
// considered harmful like input's will automatically be deleted
$config->set('HTML.Trusted', true);
// this line say that only input, p, div will be accepted
$config->set('HTML.AllowedElements', 'input,p,div');
// set attributes for each tag
$config->set('HTML.AllowedAttributes', 'input.type,input.name,p.id,div.style');
// more extensive way of manage attribute and elements... see the docs
// http://htmlpurifier.org/live/configdoc/plain.html
$def = $config->getHTMLDefinition(true);
$def->addAttribute('input', 'type', 'Enum#text');
$def->addAttribute('input', 'name', 'Text');
// call...
$purifier = new HTMLPurifier($config);
// display...
$html = $purifier->purify($raw_html);
- NOTE: as you asked this code will run as a Whitelist, only input, p and div are accepted and only certains attributes are accepted.
Use Zend framework 2 strip tags. An example below to accept ul, li, p... and img (only with src attribute) and links (with only href atttribute). Everything else will be stripped. If I'm not wrong zf1 does the same thing
$filter = new \Zend\Filter\StripTags(array(
'allowTags' => array(
'ul'=>array(),
'li'=>array(),
'p'=>array(),
'br'=>array(),
'img'=>array('src'),
'a'=>array('href')
),
'allowAttribs' => array(),
'allowComments' => false)
);
$value = $filter->filter($value);
For tags you can use strip_tags
For attributes, refer to How can I remove attributes from an html tag?