Wanted
A command line HTML5 beautifier running under Linux.
Input
Garbled, ugly HTML5 code. Possibly the result of multiple templates. You don't love it, it doesn't love you.
Output
Pure beauty. The code is nicely indented, has enough line breaks, cares for it's whitespace. Rather than viewing it in a webbrowser, you would like to display the code on your website directly.
Suspects
- tidy does too much (heck, it alters my doctype!), and it doesn't work well with HTML5. Maybe there is a way to make it cooperate and not alter anything?
- vim does too little. It only indents. I want the program to add and remove line breaks, and to play with the whitespace inside of tags.
DEAD OR ALIVE!
HTML Tidy has been forked by the w3c and now has support for HTML5 validation.
https://github.com/w3c/tidy-html5
I suspect tidy can be made to work with the right command-line parameters.
http://tidy.sourceforge.net/docs/quickref.html
You can specify an arbitrary doctype and add new block, inline, and empty tags, and turn on and off lots of tidy's cleaning options.
Depending on what you want it to "beautify" you can probably get decent results. It probably won't be able to do some of the more advanced things like rewriting the html content to eliminate spurious elements or combining them, if it doesn't recognize them.
Copied from a live website I did using HTML5 that is validated as proper HTML5 on all pages thanks to this snippet (PHP in this case but the options and logic is the same for any language used):
$options = array(
'hide-comments' => true,
'tidy-mark' => false,
'indent' => true,
'indent-spaces' => 4,
'new-blocklevel-tags' => 'article,header,footer,section,nav',
'new-inline-tags' => 'video,audio,canvas,ruby,rt,rp',
'new-empty-tags' => 'source',
'doctype' => '<!DOCTYPE HTML>',
'sort-attributes' => 'alpha',
'vertical-space' => false,
'output-xhtml' => true,
'wrap' => 180,
'wrap-attributes' => false,
'break-before-br' => false,
);
$buffer = tidy_parse_string($buffer, $options, 'utf8');
tidy_clean_repair($buffer);
// Fix a tidy doctype bug
$buffer = str_replace('<html lang="en" xmlns="http://www.w3.org/1999/xhtml">', '<!DOCTYPE HTML>', $buffer);
If you use Haml as your nanoc-filter, your html will automatically be pretty-printed. You can set html5 output as an option.