The Question
I am parsing large compressed files in Python 2.7.6 and would like to know the uncompressed file size before starting. I am trying to use the second technique presented in this SO answer. It works for bzip2 formatted files but not gzip formatted files. What is different about the two compression algorithms that causes this?
Example Code
This code snipped demonstrates the behavior, assuming you have "test.bz2" and "test.gz" present in your current working directory:
import os
import bz2
import gzip
bz = bz2.BZ2File('test.bz2', mode='r')
bz.seek(0, os.SEEK_END)
bz.close()
gz = gzip.GzipFile('test.gz', mode='r')
gz.seek(0, os.SEEK_END)
gz.close()
The following traceback is shown:
Traceback (most recent call last):
File "zip_test.py", line 10, in
gz.seek(0, os.SEEK_END)
File "/usr/lib64/python2.6/gzip.py", line 420, in seek
raise ValueError('Seek from end not supported')
ValueError: Seek from end not supported
Why does this work for *.bz2 files but not *.gz files?