In HTTP you can specify in a request that your client can accept specific content in responses using the accept
header, with values such as application/xml
. The content type specification allows you to include parameters in the content type, such as charset=utf-8
, indicating that you can accept content with a specified character set.
There is also the accept-charset
header, which specifies the character encodings which are accepted by the client.
If both headers are specified and the accept
header contains content types with the charset parameter, which should be considered the superior header by the server?
e.g.:
Accept: application/xml; q=1,
text/plain; charset=ISO-8859-1; q=0.8
Accept-Charset: UTF-8
I've sent a few example requests to various servers using Fiddler to test how they respond:
Examples
W3
Request
GET http://www.w3.org/ HTTP/1.1
Host: www.w3.org
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html; charset=utf-8
Google
Request
GET http://www.google.co.uk/ HTTP/1.1
Host: www.google.co.uk
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html; charset=ISO-8859-1
StackOverflow
Request
GET http://stackoverflow.com/ HTTP/1.1
Host: stackoverflow.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html; charset=utf-8
Microsoft
Request
GET http://www.microsoft.com/ HTTP/1.1
Host: www.microsoft.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1
Response
Content-Type: text/html
There doesn't seem to be any consensus around what the expected behaviour is. I am trying to look surprised.
Altough you can set media type in Accept
header, the charset
parameter definition for that media type is not defined anywhere in RFC 2616 (but it is not forbidden, though).
Therefore if you are going to implement a HTTP 1.1 compliant server, you shall first look for Accept-charset
header, and then search for your own parameters at Accept
header.
Read RFC 2616 Section 14.1 and 14.2. The Accept
header does not allow you to specify a charset
. You have
to use the Accept-Charset
header instead.
Firstly, Accept headers can accept parameters, see https://tools.ietf.org/html/rfc7231#section-5.3.2
All text/* mime-types can accept a charset parameter. http://www.iana.org/assignments/media-types/media-types.xhtml#text
The Accept-Charset header allows a user-agent to specify the charsets it supports.
If the Accept-Charset header did not exist, a user-agent would have to specify each charset parameter for each text/* media type it accepted, e.g.
Accept: text/html;charset=US-ASCII, text/html;charset=UTF-8, text/plain;charset=US-ASCII, text/plain;charset=UTF-8
RFC 7231 section 5.3.2 (Accept
) clearly states:
Each media-range might be followed by zero or more applicable media
type parameters (e.g., charset)
So a charset parameter for each content-type is allowed. In theory a client could accept, for example, text/html
only in UTF-8
and text/plain
only in US-ASCII
.
But it would usually make more sense to state possible charsets in the Accept-Charset
header as that applies to all types mentioned in the Accept
header.
If those headers’ charsets don’t overlap, the server could send status 406 Not Acceptable
.
However, I wouldn’t expect fancy cross-matching from a server for various reasons. It would make the server code more complicated (and therefore more error-prone) while in practice a client would rarely send such requests. Also nowadays I would expect everything server-side is using UTF-8 and sent as-is so there’s nothing to negotiate.
I don't think it matters. The client is doing something dumb; there doesn't need to be interoperability for that :-)