I've googled around but can't find any HTML minifacation scripts.
It occoured to me that maybe there's nothing more to HTML minifacation than removing all unneeded whitespace.
Am I missing something or has my Google Fu been lost?
I've googled around but can't find any HTML minifacation scripts.
It occoured to me that maybe there's nothing more to HTML minifacation than removing all unneeded whitespace.
Am I missing something or has my Google Fu been lost?
You have to be careful when removing stuff from HTML as it's a fragile language. Depending on how your pages are coded some of that whitespace might be more significant; also if you have CSS styles such as
white-space: pre
then you may need to keep the whitespace. Plus there are numerous browser bugs, etc, and basically every character in an HTML file might be there to satisfy some requirement or appease some browser.In my opinion your best bet is to design the pages well with CSS techniques (I was recently able to take an important page on the site I work for and reduce it's size by 50% just by recoding it using CSS instead of tables and nested style="..." attributes). Then, use GZip to reduce the size of your pages for browsers that understand gzip. This will save bandwidth while preserving the structure of the html.
There's a pretty lengthy discussion on this Wordpress blog about this topic. You can find a very lengthy proposed solution using PHP and HTML Tidy there.
Outside of HTML Tidy/removing white space as the other answers mentioned, there isn't much.
This is more of a manual task pulling out style attributes into CSS (hopefully you're not using FONT tags, etc.), using fewer tags and attributes where possible (like not embedding <strong> tags in an element but using CSS to make the whole element font-weight: bold, unless of course it makes semantic sense to use >strong<), etc.
Yes I guess it's pretty much removing whitespace and comments. You cannot replace identifiers with shorter ones like in javascript, since chances are that CSS classes or javascript will depend on those identifiers.
Also, you should be careful when removing whitespace and make sure that there is always at least whitespace character left, otherwise allyourtextwilllooklikethis.
Here is a minifier for HTML5 written in PHP.
After that you have a one line, shorter html code.
Better you make an array from the regular expressions, but aware to escape the back slashes.