可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I\'m using

 data=urllib2.urlopen(url).read()

I want to know:

How can I tell if the data at a URL is gzipped?
Does urllib2 automatically uncompress the data if it is gzipped? Will the data always be a string?

回答1:

How can I tell if the data at a URL is gzipped?

This checks if the content is gzipped and decompresses it:

from StringIO import StringIO
import gzip

request = urllib2.Request(\'http://example.com/\')
request.add_header(\'Accept-encoding\', \'gzip\')
response = urllib2.urlopen(request)
if response.info().get(\'Content-Encoding\') == \'gzip\':
    buf = StringIO(response.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()

Does urllib2 automatically uncompress the data if it is gzipped? Will the data always be a string?

No. The urllib2 doesn\'t automatically uncompress the data because the \'Accept-Encoding\' header is not set by the urllib2 but by you using: request.add_header(\'Accept-Encoding\',\'gzip, deflate\')

回答2:

If you are talking about a simple .gz file, no, urllib2 will not decode it, you will get the unchanged .gz file as output.

If you are talking about automatic HTTP-level compression using Content-Encoding: gzip or deflate, then that has to be deliberately requested by the client using an Accept-Encoding header.

urllib2 doesn\'t set this header, so the response it gets back will not be compressed. You can safely fetch the resource without having to worry about compression (though since compression isn\'t supported the request may take longer).

回答3:

Your question has been answered, but for a more comprehensive implementation, take a look at Mark Pilgrim\'s implementation of this, it covers gzip, deflate, safe URL parsing and much, much more, for a widely-used RSS parser, but nevertheless a useful reference.

回答4:

It appears urllib3 handles this automatically now.

Reference headers:

HTTPHeaderDict({\'ETag\': \'\"112d13e-574c64196bcd9-gzip\"\', \'Vary\': \'Accept-Encoding\', \'Content-Encoding\': \'gzip\', \'X-Frame-Options\': \'sameorigin\', \'Server\': \'Apache\', \'Last-Modified\': \'Sat, 01 Sep 2018 02:42:16 GMT\', \'X-Content-Type-Options\': \'nosniff\', \'X-XSS-Protection\': \'1; mode=block\', \'Content-Type\': \'text/plain; charset=utf-8\', \'Strict-Transport-Security\': \'max-age=315360000; includeSubDomains\', \'X-UA-Compatible\': \'IE=edge\', \'Date\': \'Sat, 01 Sep 2018 14:20:16 GMT\', \'Accept-Ranges\': \'bytes\', \'Transfer-Encoding\': \'chunked\'})

Reference code:

import gzip
import io
import urllib3

class EDDBMultiDataFetcher():
    def __init__(self):
        self.files_dict = {
            \'Populated Systems\':\'http://eddb.io/archive/v5/systems_populated.jsonl\',
            \'Stations\':\'http://eddb.io/archive/v5/stations.jsonl\',
            \'Minor factions\':\'http://eddb.io/archive/v5/factions.jsonl\',
            \'Commodities\':\'http://eddb.io/archive/v5/commodities.json\'
            }
        self.http = urllib3.PoolManager()
    def fetch_all(self):
        for item, url in self.files_dict.items():
            self.fetch(item, url)

    def fetch(self, item, url, save_file = None):
        print(\"Fetching: \" + item)
        request = self.http.request(
            \'GET\',
            url,
            headers={
                \'Accept-encoding\': \'gzip, deflate, sdch\'
                })
        data = request.data.decode(\'utf-8\')
        print(\"Fetch complete\")
        print(data)
        print(request.headers)
        quit()


if __name__ == \'__main__\':
    print(\"Fetching files from eddb.io\")
    fetcher = EDDBMultiDataFetcher()
    fetcher.fetch_all()

Does python urllib2 automatically uncompress gzip

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Does python urllib2 automatically uncompress gzip

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮