For example, is it valid ajax request:
$.ajax({
type: "POST",
url: "SomePage.aspx/GetSomeObjects",
contentType: "application/json; charset=utf-8",
...
});
It is used as an example sometimes, or software can break without explicit charset.
The rfc 4627 for application/json media type says that it doesn't accept any parameters in section 6:
The MIME media type for JSON text is application/json.
Type name: application
Subtype name: json
Required parameters: n/a
Optional parameters: n/a
It can be interpreted that charset shouldn't be used with application/json.
And section 3 suggests
that it is not necessary to specify charset:
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
because UTF-8,16,32 encodings can be infered from the content. Why does it say that UTF-8 is default? The way to choose other character encoding is not specified in the rfc and the encoding can be found deterministically anyway. Or are there other (not UTF-8,16,32) character encodings that support Unicode?
Some argue that charset can be used:
I disagree with your assessment that it must be dropped. RFC 2046
states that "other media types than subtypes of "text" might choose to
employ the charset parameter as defined here," which indicates that
there is no restriction on the presence of the charset parameter on
application types. Additionally, RFC 2045 states that "MIME
implementations must ignore any parameters whose names they do not
recognize." So, it is not reasonable to assume that there is any harm
being done by its presence.
May rfc-compliant software generate content type application/json with a charset parameter? Should rfc-compliant software accept such requests?
application/json doesn't define a charset parameter, so it is incorrect to include one. What RFC 2046 says is that application types in general could have a charset parameter, such as application/xml. But JSON does not.
The recent json rfc 7159 says:
Note: No "charset" parameter is defined for this registration.
Adding one really has no effect on compliant recipients.
i.e., charset
must be ignored by compliant recipients.
It is consistent with rfc 2045: "MIME implementations must ignore any parameters whose names they do not recognize." because rfc 7159 still specifies: "Required parameters: n/a; Optional parameters: n/a" for application/json mime media type (no parameters).
The json text is no longer constrained to be an object or an array and the old section 3 that computes the character encoding based on the first two characters is gone in the new rfc. UTF-8, UTF-16, or UTF-32 are allowed but there is no way to specify the encoding (no BOM, UTF-8 is the default).
Can charset parameter be used with application/json content type in http/1.1?
There is no harm if charset="utf-8"
is used -- utf-8 is the default encoding for json text but other values might be misleading because the value must be ignored by compliant recipients. It can only break clients that parse Content-Type header incorrectly e.g., by comparing it verbatim with "application/json" or clients that attempt to use some other than utf-8 encoding to decode the json text.
May rfc-compliant software generate content type application/json with a charset parameter?
no. No parameters are defined for application/json.
Should rfc-compliant software accept such requests?
yes, it should. The value of charset
must be ignored.
ECMA-404 (The JSON Data Interchange Format) defines json text in terms of Unicode code points i.e., json itself specifies no behavior regarding encoding details.
ECMA-262 (ECMAScript Language Specification) also defines the json format on top of String (Unicode type).
Should rfc-compliant software accept such requests?
According to Julian Reschke's answer, apparently not. However, as you pointed out, you will potentially encounter it in the wild and then would have to cope with it, if you want to talk to those non-rfc compliant hosts.
For one, if you have code in place that handles Accept-Charset
and the charset part of content types for text-based messages in your HTTP framework, why not just use it for JSON, too? Programming-wise, it is both easier (no special rule for json) and more general.
Personally, I'd say let's go Unicode (using the encoding detection you quote) for every bit of text. Unfortunately, there are client devices out there, e.g. Japanese mobile phones, that don't handle Unicode, but only Shift_JIS
. They'd otherwise be happy JSON consumers (and paying customers). So what are you going to do? In my particular case, to get these clients on board, I made the charset configurable via the standard HTTP mechanisms.
On a side note, HTTP 2.0
is being worked on right now, and if the guys ever hope to create a standard that is rigidly adhered to, they will have to write acceptance tests. That could of course also mean excluding aforementioned legacy clients if the rules can't be bent occasionally.
And what's the point of being compliant if nobody else is but you? I wonder if even Opera
is compliant, or, for that matter, if all the RFCs it implements can be unambiguously interpreted in the first place. I don't think so, especially in the case of larger ones like HTTP
.
If this sounds like HTTP bashing, let me just say this: HTTP is a great standard with concepts that revolutionized not only the internet. The way e.g. resources are specified (statelessness), or the way caching is done, has established good patterns that have trickled down into the implementations of many applications. And HTTP 2 could pick up where 1.1 left off. Let's just hope that SPDY won't be adopted 1 to 1. I hate to say it, but in this case it looks like Microsoft's HTTP Speed+Mobility is in many ways more HTTP-ey than Google's PUSHy (and insofar unRESTy) SPDY.