What confuses me is decoding of HTTP header values.
Example Header:
Some-Header: "quoted string?"; *utf-8'en'Weirdness
Can header value's be quoted? What about the encoding of a "
itself? is '
a valid quote character? What's the significance of a semi-colon (;
)? Could the value parser for a HTTP header be considered a MIME parser?
I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields. That's why I need so much detail on the format.
If you mean does the RFC 5987
parameter
production apply to the main part of the header value, then no.Here the main part of the header value would probably be
"foo"
including the quotes, but...The specific handling is defined for each named header separately. So semicolon is significant for, say,
Content-Disposition
, but not forContent-Length
.Obviously this is not a very satisfactory solution but that's what we're stuck with.
You can't handle these in a generic way, you have to know the form of each possible header. For anything you don't recognise, don't attempt to decompose the header value; and really, so little out there supports RFC 5987 at the moment, it's unlikely you'll be able to do much useful handling of it.
Status quo today is that non-ASCII characters in header values doesn't work well enough cross-browser to be used at all, either encoded or raw.
Luckily they are rarely needed. The only really common use case is non-ASCII filenames for
Content-Disposition
but that's easier to work around by putting the filename in a trailing URL path part instead.No. HTTP borrows heavily from MIME and the RFC 822 family of standards in general, but it isn't part of the 822 family. It has its own low-level grammar for headers which looks like 822, but isn't quite compatible. Arbitrary MIME features can't be used in HTTP, there has to be a standardisation mechanism to drag them into HTTP explicitly—which is what RFC 5987 is, for (parts of) RFC 2231.
(See section 19.4 of RFC 2616 for discussion of some other differences.)
In theory, a
multipart
form submission is part of the 822 family and you should be able to use RFC 2231 encoding there. But the reality is browsers don't support that either.