Google pages suggest you to minify HTML, that is, remove all the unnecessary spaces.
CodeIgniter does have the feature of giziping output or it can be done via .htaccess
.
But still I also would like to remove unnecessary spaces from the final HTML output as well.
I played a bit with this piece of code to do it, and it seems to work. This does indeed result in HTML that is without excess spaces and removes other tab formatting.
class Welcome extends CI_Controller
{
function _output()
{
echo preg_replace('!\s+!', ' ', $output);
}
function index(){
...
}
}
The problem is there may be tags like
<pre>
,<textarea>
, etc.. which may have spaces in them and a regular expression should remove them.
So, how do I remove excess space from the final HTML, without effecting spaces or formatting for these certain tags using a regular expression?
Thanks to @Alan Moore got the answer, this worked for me
echo preg_replace('#(?ix)(?>[^\S ]\s*|\s{2,})(?=(?:(?:[^<]++|<(?!/?(?:textarea|pre)\b))*+)(?:<(?>textarea|pre)\b|\z))#', ' ', $output);
ridgerunner did a very good job of analyzing this regular expression. I ended up using his solution. Cheers to ridgerunner.
I implemented the answer from @ridgerunner in two projects, and ended up hitting some severe slowdowns (10-30 second request times) in staging for one of the projects. I found out that I had to set both
pcre.recursion_limit
andpcre.backtrack_limit
quite low for it to even work, but even then it would give up after about 2 senconds of processing and return false.Since that, I've replaced it with this solution (with easier-to-grasp regex), which is inspired by the outputfilter.trimwhitespace function from Smarty 2. It does no backtracking or recursion, and works every time (instead of catastrophically failing once in a blue moon):
For those curious about how Alan Moore's regex works (and yes, it does work), I've taken the liberty of commented it so it can be read by mere mortals:
I'm new around here, but I can see right off that Alan is quite good at regex. I would only add the following suggestions.
<SCRIPT>
element should be added to the<PRE>
and<TEXTAREA>
blacklist.'S'
PCRE "study" modifier speeds up this regex by about 20%.(?:[^<]++|<(?!/?(?:textarea|pre)\b))*+
) is susceptible to excessive PCRE recursion on large target strings, which can result in a stack-overflow causing the Apache/PHP executable to silently seg-fault and crash with no warning. (The Win32 build of Apachehttpd.exe
is particularly susceptible to this because it has only 256KB stack compared to the *nix executables, which are typically built with 8MB stack or more.) Philip Hazel (the author of the PCRE regex engine used in PHP) discusses this issue in the documentation: PCRE DISCUSSION OF STACK USAGE. Although Alan has correctly applied the same fix as Philip shows in this document (applying a possessive plus to the first alternative), there will still be a lot of recursion if the HTML file is large and has a lot of non-blacklisted tags. e.g. On my Win32 box (with an executable having a 256KB stack), the script blows up with a test file of only 60KB. Note also that PHP unfortunately does not follow the recommendations and sets the default recursion limit way too high at 100000. (According to the PCRE docs this should be set to a value equal to the stack size divided by 500).Here is an improved version which is faster than the original, handles larger input, and gracefully fails with a message if the input string is too large to handle:
p.s. I am intimately familiar with this PHP/Apache seg-fault problem, as I was involved with helping the Drupal community while they were wrestling with this issue. See: Optimize CSS option causes php cgi to segfault in pcre function "match". We also experienced this with the BBCode parser on the FluxBB forum software project.
Hope this helps.