I have a gzip file and I am trying to read it via Python as below:
import zlib
do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)
it throws this error:
zlib.error: Error -3 while decompressing: incorrect header check
How can I overcome it?
Update: dnozay's answer explains the problem and should be the accepted answer.
Try the gzip
module, code below is straight from the python docs.
import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()
You have this error:
zlib.error: Error -3 while decompressing: incorrect header check
Which is most likely because you are trying to check headers that are not there, e.g. your data follows RFC 1951
(deflate
compressed format) rather than RFC 1950
(zlib
compressed format) or RFC 1952
(gzip
compressed format).
choosing windowBits
But zlib
can decompress all those formats:
- to (de-)compress
deflate
format, use wbits = -zlib.MAX_WBITS
- to (de-)compress
zlib
format, use wbits = zlib.MAX_WBITS
- to (de-)compress
gzip
format, use wbits = zlib.MAX_WBITS | 16
See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2
)
examples
test data:
>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>>
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>
obvious test for zlib
:
>>> zlib.decompress(zlib_data)
'test'
test for deflate
:
>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'
test for gzip
:
>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'
the data is also compatible with gzip
module:
>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()
automatic header detection (zlib or gzip)
adding 32
to windowBits
will trigger header detection
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'
using gzip
instead
or you can ignore zlib
and use gzip
module directly; but please remember that under the hood, gzip
uses zlib
.
fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
I just solved the "incorrect header check" problem when uncompressing gzipped data.
You need to set -WindowBits => WANT_GZIP in your call to inflateInit2 (use the 2 version)
Yes, this can be very frustrating. A typically shallow reading of the documentation presents Zlib as an API to Gzip compression, but by default (not using the gz* methods) it does not create or uncompress the Gzip format. You have to send this non-very-prominently documented flag.
Funnily enough, I had that error when trying to work with the Stack Overflow API using Python.
I managed to get it working with the GzipFile
object from the gzip directory, roughly like this:
import gzip
gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))
file_contents = gzip_file.read()
My case was do decompress email messages that are stored in Bullhorn database. The snippet is the following:
import pyodbc
import zlib
cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')
for msg in cursor.fetchall():
#magic in the second parameter, use negative value for deflate format
decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)
Just add headers 'Accept-Encoding': 'identity'
import requests
requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})
https://github.com/requests/requests/issues/3849