Using gzip, tell() returns the offset in the uncompressed file.
In order to show a progress bar, I want to know the original (uncompressed) size of the file.
Is there an easy way to find out?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
GzipFile.size stores the uncompressed size, but it's only incremented when you read the file, so you should prefer len(fd.read()) instead of the non-public GzipFile.size.
I am not sure about performance, but this could be achieved without knowing
gzip
magic by using:This should also work for other (compressed) stream readers like
bz2
or the plainopen
.EDIT: as suggested in the comments,
2
in second line was replaced byio.SEEK_END
, which is definitely more readable and probably more future-proof.EDIT: Works only in Python 3.
Looking at the source for the
gzip
module, I see that the underlying file object forGzipFile
seems to befileobj
. So:?
Maybe it would be good to do some sanity checking before doing that, like checking that the attribute exists with
hasattr
.Not exactly a public API, but...
The last 4 bytes of the .gz hold the original size of the file
Unix way: use "gunzip -l file.gz" via subprocess.call / os.popen, capture and parse its output.
The gzip format specifies a field called
ISIZE
that:In gzip.py, which I assume is what you're using for gzip support, there is a method called
_read_eof
defined as such:There you can see that the
ISIZE
field is being read, but only to to compare it toself.size
for error detection. This then should mean thatGzipFile.size
stores the actual uncompressed size. However, I think it's not exposed publicly, so you might have to hack it in to expose it. Not so sure, sorry.I just looked all of this up right now, and I haven't tried it so I could be wrong. I hope this is of some use to you. Sorry if I misunderstood your question.