Is there a simple approach to add a HTML5 ruleset for HTMLPurifier?
HP can be configured to recognize new tags with:
// setup configurable HP instance
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'html5 draft');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // no caching
$def = $config->getHTMLDefinition(true);
// add a new tag
$form = $def->addElement(
'article', // name
'Block', // content set
'Flow', // allowed children
'Common', // attribute collection
array( // attributes
)
);
// add a new attribute
$def->addAttribute('a', 'contextmenu', "ID");
However this is clearly a bit of work. Since there are a lot of new HTML5 tags and attributes that had to be registered. And new global attributes should be combinable even with existing HTML 4 tags. (It's difficult to judge from the docs how to augment core rules). So, is there a more useful config format/array structure to feed new and updated tag+attribute+context configuration (inline/block/empty/flow/..) into HTMLPurifier?
# mostly confused about how to extend existing tags:
$def->addAttribute('input', 'type', "...|...|...");
# or how to allow data-* attributes (if I actually wanted that):
$def->addAttribute("data-*", ...
And of course not all new HTML5 tags are fit for unrestricted allowance. HTMLPurifier is all about content filtering. Defining value constraints is where it's at. -- <canvas>
for example might not be that big of a deal when it appears in user content. Because it's useless at best without Javascript (which HP already filters out). But other tags and attributes might be undesirable; so a flexible configuration structure is imperative for enabling/disabling tags and their associated attributes.
(Guess I should update some research...). But there's still no practical compendium/specification (no, XML DTDs aren't) that suits a HP configuration.
- http://simon.html5.org/html-elements
- http://www.w3.org/TR/html5-diff/#new-elements
- http://www.w3.org/TR/html5-diff/#new-attributes
(Uh, and HTML5 is no longer a draft.)
The php tidy extension can be configured to recognize html5 tags. http://tidy.sourceforge.net/docs/quickref.html#new-blocklevel-tags
Gallery Role has an experimental HTML5 parser that is based on HTMLPurifier:
https://github.com/gallery/gallery3-vendor/blob/master/htmlpurifier/modified/HTMLPurifier/Lexer/PH5P.php
im using a fix for wordpress but maybe this can help you too (at least for the array part)
http://nicolasgallagher.com/using-html5-elements-in-wordpress-post-content/
http://hybridgarden.com/blog/misc/adding-html5-capability-to-wordpress/
also:
There's this configuration for HTMLpurify to allow newer HTML5 tags.
Source: https://github.com/kennberg/php-htmlpurfier-html5
.