Illegal characters in HTTP headers

2019-01-10 21:56发布

问题:

I'm creating an HttpUrlConnection and need to set multiple custom headers.

I'd like to do something along the lines of the following, but the contents of the header map needs to come from a single string. Are there any characters that are illegal or extremely rarely used in both HTTP header names and HTTP header values?

HashMap<String, String> headers = new HashMap<String, String>();

// TODO: How can I fill the headers map reliably from a single string?

HttpURLConnection c = (HttpURLConnection) url.openConnection();
for(Map.Entry<String, String> e : headers.entrySet())
    c.setRequestProperty(e.getKey(), e.getValue());

Solution for now

Doesn't seem like any HTTP header names contain any spaces (usually use dash instead?), so I can separate the name with the value using a single space. As for the name-value sets, it seems I'm screwed since the value can contain pretty much anything according to the given answer. So I've just picked a character I'm pretty sure will most likely never be used: §. If it turns out it is actually needed, I'll just have to adjust my code :p

Header1 Value1§Header2 Value2§Header3 Header3

回答1:

The relevant BNF from RFC7230 is:

field-name = token

token = 1*tchar

tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / 
        "." / "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA

The character set is visible USASCII.

RFC 7230 is more recent than your question, but in the relevant particulars, it does not change what was formerly said by RFC 2616.

There's a very strong convention for field names which is much more restrictive than what the RFC allows, and this is enforced to various degrees in various implementations. Field Names usually follow a pattern of a sequence of [ASCII / NUMERAL] words with the first letter (only) of each word being capitalised. The words are separated with a single hyphen.

So, for example "HttpUrlConnection" was supposed to be an HTTP Header name (rather than a java token), you'd call it 'Http-Url-Connection'.

I dimly remember once tracking a bug down to some implementation being strict enough not to admit multiple capitals in one word (which happened to be an acronym). I.e. it pays to follow this more restricted format very strictly.

  • Non ASCII character sets play no part in field-names, though they may be used in field values.

  • Escaping in field names is not supported by the standard. Escaping of values is not hte concern of the HTTP or MIME standards, but you could choose to reuse the standard URL encoding method for encoding a set of name value pairs.