zlib/gzip interpreter

2019-05-28 04:18发布

问题:

Greetings, I'm trying to analyze the output of the zlib(gzip) algorithm compared to the input. Determine stuff like dictionary size, the substring run-length pairs and where they correspond in the original plaintext. I'm using zlib to exchange many very small chunks of data (under 1K each), and want to determine overhead from the dictionary, a percentage of substring matches vs. dictionary-encoded plaintext in the results, that sort of thing.

After a quick googling didn't yield results, I'm asking here before I start seeding the zlib source code with debug messages to get a similar result.

Does something off-the-shelf already exist for this?

回答1:

Take a look at http://zlib.net/infgen.c.gz.

From the comments in the code:

 * Read a zlib, gzip, or raw deflate stream from stdin and write a defgen
 * compatible stream representing that input to stdout (though any specific
 * zlib or gzip header information will be lost).  This is based on the puff.c
 * code to decompress deflate streams.  Note that neither the zlib nor the gzip
 * trailer is checked against the uncompressed data (in fact the uncompressed
 * data is never generated) -- all that is checked is that the trailer is
 * present.