How to enable gzip compression using PHP Simple HT

2019-04-09 23:01发布

I have tried a few things to enable gzip compression using PHP Simple HTML DOM Parser but nothing has seemed to work thus far. Using ini_set I've manged to change the user agent, so I figured it might be possible to also enable gzip compression?

include("simpdom/simple_html_dom.php");
ini_set('zlib.output_compression', 'On');   
$url = 'http://www.whatsmyip.org/http_compression/';
$html = file_get_html($url);
print $html;

The website above tests it. Please let me know if I am going about this the wrong way completely.

====

UPDATE

For anyone else trying to achieve the same thing, it's best to just use cURL, then use the dom parser like so:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // Define target site
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return page in string
curl_setopt($cr, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");     
curl_setopt($ch, CURLOPT_TIMEOUT,5); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects

$return = curl_exec($ch); 
$info = curl_getinfo($ch); 
curl_close($ch); 

$html = str_get_html("$return");

标签: php dom gzip
2条回答
Rolldiameter
2楼-- · 2019-04-09 23:04

Just add the following line at the very top of the PHP script that outputs the data:

  ob_start("ob_gzhandler");

Reference

-------Update--------

You can also try to enable gzip Compresion sitewide via a .htaccess file. Something like This should gzip your sites content but images:

# Insert filter
SetOutputFilter DEFLATE

# Netscape 4.x has some problems...
BrowserMatch ^Mozilla/4 gzip-only-text/html

# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0[678] no-gzip

# MSIE masquerades as Netscape, but it is fine
# BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won't work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html

# Don't compress images
#SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png)$ no-gzip dont-vary

# Make sure proxies don't deliver the wrong content
Header append Vary User-Agent env=!dont-vary
查看更多
疯言疯语
3楼-- · 2019-04-09 23:09

CURLOPT_ENCODING is so that the response comes back (accepted as) gzipped data - the server settings (ob_start("ob_gzhandler") or php_ini..) tell the server to OUTPUT gzipped data.

Just like if you went to that page with a browser that didn't support gzip. To accept gzip data, you have to use curl so you can make that distinction.

查看更多
登录 后发表回答