Parsing zlib header

2019-09-20 06:07发布

I spent a few days reading zlib (and gzip and deflate) RFC and I can say they are kind of rubbish. Quite some details are missing, so I'm opening this question.

I'm trying to parse a zlib data and I need to know some details about the header.

First of all, RFC says there will be 2 bytes, CMF and FLG.

CMF is divided in 2 4 bits sections. The first one is CM and the second one is CINFO.

What are the possible values of CM? RFC says that 8 means deflate and that 15 is reserved, but what about the rest of the possible values?

CINFO on the other side, should be always 8, if I understand the RFC correctly (please correct me if I'm wrong).

Skipping FLG and the possible FDICT, we get to the Compressed data section. This part of the RFC says:

For compression method 8, the compressed data is stored in the
deflate compressed data format as described in the document
"DEFLATE Compressed Data Format Specification" by L. Peter
Deutsch. (See reference [3] in Chapter 3, below)

What does this mean? Should I assume that CM will always be 8? If yes, then why does the entire CM thing exists?

Last, I'm a bit confused. I always believe zlib can wrap both deflate and gzip, but reading this RFC I can't see where a gzip compressed data fits in here. Is there anything that I'm missing about this?

1条回答
太酷不给撩
2楼-- · 2019-09-20 06:28

What are the possible values of CM? RFC says that 8 means deflate and that 15 is reserved, but what about the rest of the possible values?

...

Should I assume that CM will always be 8? If yes, then why does the entire CM thing exists?

CM is there for future use and to allow other (non-standard) compression methods:

Other compressed data formats are not specified in this version of the zlib specification. (RFC 1950, "ZLIB Compressed Data Format Specification version 3.3")

You should NOT assume that it's always 8. Instead, you should check it and, if it's not 8, throw a "not supported" error.


CINFO on the other side, should be always 8, if I understand the RFC correctly (please correct me if I'm wrong).

No, the meaning of CINFO depends on CM. If CM is 8 (the only meaningful standardized value), then:

CINFO is the base-2 logarithm of the LZ77 window size, minus eight (CINFO=7 indicates a 32K window size). Values of CINFO above 7 are not allowed in this version of the specification. (RFC 1950, "ZLIB Compressed Data Format Specification version 3.3")

So in fact CINFO can't be 8.


Skipping FLG and the possible FDICT, we get to the Compressed data section. This part of the RFC says:

For compression method 8, the compressed data is stored in the
deflate compressed data format as described in the document
"DEFLATE Compressed Data Format Specification" by L. Peter
Deutsch. (See reference [3] in Chapter 3, below)

What does this mean?

It means that the details for the DEFLATE encoding is not specified in this standard, but is described elsewhere, at ftp://ftp.uu.net/pub/archiving/zip/zlib/.

If you prefer, DEFLATE has its own RFC, that is RFC 1951, "DEFLATE Compressed Data Format Specification version 1.3".


Last, I'm a bit confused. I always believe zlib can wrap both deflate and gzip, but reading this RFC I can't see where a gzip compressed data fits in here. Is there anything that I'm missing about this?

No, zlib can't wrap gzip. gzip and zlib are different wrappers for deflate data (as is the zip format, the PNG format, the PDF format, etc.)

Gzip uses DEFLATE:

The format presently uses the DEFLATE method of compression but can be easily extended to use other compression methods. (RFC 1952, "GZIP file format specification version 4.3")

CM = 8 denotes the "deflate" compression method with a window size up to 32K. This is the method used by gzip and PNG (RFC 1950, "ZLIB Compressed Data Format Specification version 3.3")


If you find the RFC unclear or difficult to understand, consider looking into the source code for an implementation of zlib. While some implementations may be non-standard, looking at the source may help you solve some of your doubts.

Here's an excerpt from the source code of zlib from zlib.net that answers one of your questions:

#define Z_DEFLATED   8
/* ... */
if (BITS(4) != Z_DEFLATED) { 
    strm->msg = (char *)"unknown compression method";
    state->mode = BAD;
    break;
}
查看更多
登录 后发表回答