Get uncompressed size of a .gz file in python

2019-01-27 17:46发布

Using gzip, tell() returns the offset in the uncompressed file.
In order to show a progress bar, I want to know the original (uncompressed) size of the file.
Is there an easy way to find out?

标签: python gzip
10条回答
聊天终结者
2楼-- · 2019-01-27 18:35

Uncompressed size is stored in the last 4 bytes of the gzip file. We can read the binary data and convert it to an int. (This will only work for files under 4GB)

import struct

def getuncompressedsize(filename):
    with open(filename, 'rb') as f:
        f.seek(-4, 2)
        return struct.unpack('I', f.read(4))[0]
查看更多
够拽才男人
3楼-- · 2019-01-27 18:37
import gzip

File = gzip.open("input.gz", "r")
Size = gzip.read32(File)
查看更多
等我变得足够好
4楼-- · 2019-01-27 18:38
    f = gzip.open(filename)
    # kludge - report uncompressed file position so progess bars
    # don't go to 400%
    f.tell = f.fileobj.tell
查看更多
三岁会撩人
5楼-- · 2019-01-27 18:38

Despite what the other answers say, the last four bytes are not a reliable way to get the uncompressed length of a gzip file. First, there may be multiple members in the gzip file, so that would only be the length of the last member. Second, the length may be more than 4 GB, in which case the last four bytes represent the length modulo 232. Not the length.

However for what you want, there is no need to get the uncompressed length. You can instead base your progress bar on the amount of input consumed, as compared to the length of the gzip file, which is readily obtained. For typical homogenous data, that progress bar would show exactly the same thing as a progress bar based instead on the uncompressed data.

查看更多
登录 后发表回答