What is the current state of affairs when it comes to whether to do
Transfer-Encoding: gzip
or a
Content-Encoding: gzip
when I want to allow clients with e.g. limited bandwidth to signal their willingness to accept a compressed response and the server have the final say whether or not to compress.
The latter is what e.g. Apache's mod_deflate and IIS do, if you let it take care of compression. Depending on the size of the content to be compressed, it will do the additional Transfer-Encoding: chunked
.
It will also include a Vary: Accept-Encoding
, which already hints at the problem. Content-Encoding
seems to be part of the entity, so changing the Content-Encoding
amounts to a change of the entity, i.e. a different Accept-Encoding
header means e.g. a cache cannot use its cached version of the otherwise identical entity.
Is there a definite answer on this that I have missed (and that's not buried inside a message in a long thread in some apache newsgroup)?
My current impression is:
- Transfer-Encoding would in fact be the right way to do what is mostly done with Content-Encoding by existing server and client implentations
- Content-Encoding, because of its semantic implications, carries a couple of issues (what should the server do to the
ETag
when it transparently compresses a response?) - The reason is chicken'n'egg: Browsers don't support it because servers don't because browsers don't
So I am assuming the right way would be a Transfer-Encoding: gzip
(or, if I additionally chunk the body, it would become Transfer-Encoding: gzip, chunked
). And no reason to touch Vary
or ETag
or any other header in that case as it's a transport-level thing.
For now I don't care too much about the 'hop-by-hop'-ness of Transfer-Encoding
, something that others seem to be concerned about first and foremost, because proxies might uncompress and forward uncompressed to the client. However, proxies might just as well forward it as-is (compressed), if the original request has the proper Accept-Encoding
header, which in case of all browsers that I know is a given.
Btw, this issue is at least a decade old, see e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=68517 .
Any clarification on this will be appreciated. Both in terms of what is considered standards-compliant and what is considered practical. For example, HTTP client libraries only supporting transparent "Content-Encoding" would be an argument against practicality.
The correct usage, as defined in RFC 2616 and actually implemented in the wild, is for the client to send an
Accept-Encoding
request header (the client may specify multiple encodings). The server may then, and only then, encode the response according to the client's supported encodings (if the file data is not already stored in that encoding), indicate in theContent-Encoding
response header which encoding is being used. The client can then read data off of the socket based on theTransfer-Encoding
(ie,chunked
) and then decode it based on theContent-Encoding
(ie:gzip
).So, in your case, the client would send an
Accept-Encoding: gzip
request header, and then the server may decide to compress (if not already) and send aContent-Encoding: gzip
and optionallyTransfer-Encoding: chunked
response header.And yes, the
Transfer-Encoding
header can be used in requests, but only for HTTP 1.1, which requires that both client and server implementations support thechunked
encoding in both directions.ETag
uniquely identifies the resource data on the server, not the data actually being transmitted. If a given URL resource changes itsETag
value, it means the server-side data for that resource has changed.Quoting Roy T. Fielding, one of the authors of RFC 2616:
Source: https://issues.apache.org/bugzilla/show_bug.cgi?id=39727#c31
In other words: Don't do on-the-fly Content-Encoding, use Transfer-Encoding instead!
Edit: That is, unless you want to serve gzipped content to clients that only understand Content-Encoding. Which, unfortunately, seems to be most of them. But be aware that you leave the realms of the spec and might run into issues such as the one mentioned by Fielding as well as others, e.g. when caching proxies are involved.