可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

When a browser sends an HTTP request to a web server, what encoding is used to encode the HTTP protocol on the wire? Is it ASCII? UTF8? or UTF16? Or does it specify which encoding it uses in a predefined format (before any decoding takes place?)

P.S I'm not asking about the actual payload (e.g. HTML) of the request/response. I'm asking about the request line (i.e. GET /index.html HTTP/1.1) and headers (i.e. Host: google.com)

回答1:

HTTP 1.1 uses US-ASCII as basic character set for the request line in requests, the status line in responses (except the reason phrase) and the field names but allows any octet in the field values and the message body.

回答2:

RFC 2616 includes this:

OCTET          = <any 8-bit sequence of data>
CHAR           = <any US-ASCII character (octets 0 - 127)>
UPALPHA        = <any US-ASCII uppercase letter "A".."Z">
LOALPHA        = <any US-ASCII lowercase letter "a".."z">
ALPHA          = UPALPHA | LOALPHA
DIGIT          = <any US-ASCII digit "0".."9">
CTL            = <any US-ASCII control character
                  (octets 0 - 31) and DEL (127)>
CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
SP             = <US-ASCII SP, space (32)>
HT             = <US-ASCII HT, horizontal-tab (9)>
<">            = <US-ASCII double-quote mark (34)>

And then pretty much everything else in the document is defined in terms of those entities (OCTET, CHAR, etc.). So you could look through the RFC to find out which parts of an HTTP request/response can include OCTETs; all other parts must be ASCII. (I'd do it myself, but it'd take a long time)

For the request line specifically, the method name and HTTP version are going to be ASCII characters only, but it's possible that the URL itself could include non-ASCII characters. But if you look at RFC 2396, it says that.