可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am writing an HTTP proxy and I am having trouble understanding some details of making a CONNECT request over TLS. To get a better picture, I am experimenting with Apache to observe how it interacts with clients. This is from my default virtual host.

NameVirtualHost *:443
<VirtualHost>
  ServerName example.com
  DocumentRoot htdocs/example.com  
  ProxyRequests On
  AllowConnect 22
  SSLEngine on
  SSLCertificateFile /root/ssl/example.com-startssl.pem
  SSLCertificateKeyFile /root/ssl/example.com-startssl.key
  SSLCertificateChainFile /root/ssl/sub.class1.server.ca.pem
  SSLStrictSNIVHostCheck off
</VirtualHost>

The conversation between Apache and my client goes like this.

a. client connects to example.com:443 and sends example.com in the TLS handshake.

b. client sends HTTP request.

CONNECT 192.168.1.1:22 HTTP/1.1
Host: example.com
Proxy-Connection: Keep-Alive

c. Apache says HTTP/1.1 400 Bad Request. The Apache error log says

Hostname example.com provided via SNI and hostname 192.168.1.1
provided via HTTP are different.

It appears that Apache does not look at the Host header other than to see that it is there since HTTP/1.1 requires it. I get identical failed behavior if the client sends Host: foo. If I make the HTTP request to example.com:80 without TLS, then Apache will connect me to 192.168.1.1:22.

I don't completely understand this behavior. Is there something wrong with the CONNECT request? I can't seem to locate the relevant parts of the RFCs that explain all this.

回答1:

It's not clear whether you're trying to use Apache Httpd as a proxy server, this would explain the 400 status code you're getting. CONNECT is used by the client, and sent to the proxy server (possibly Apache Httpd, but usually not), not to the destination web server.

CONNECT is used between the client and the proxy server before establishing the TLS connection between the client and the end server. The client (C) connects to the proxy (P) proxy.example.com and sends this request (including blank line):

C->P: CONNECT www.example.com:443 HTTP/1.1
C->P: Host: www.example.com:443
C->P:

The proxy opens a TCP connection to www.example.com:443 (P-S) and responds to the client with a 200 status code, accepting the request:

P->C: 200 OK
P->C:

After this, the connection between the client and the proxy (C-P) is kept open. The proxy server relays everything on the C-P connection to and from P-S. The client upgrades its active (P-S) connection to an SSL/TLS connection, by initiating a TLS handshake on that channel. Since everything is now relayed to the server, it's as if the TLS exchange was done directly with www.example.com:443.

The proxy doesn't play any role in the handshake (and thus with SNI). The TLS handshake effectively happens directly between the client and the end server.

If you're writing a proxy server, all you need to do for allowing your clients to connect to HTTPS servers is read in the CONNECT request, make a connection from the proxy to the end server (given in the CONNECT request), send the client with a 200 OK reply and then forward everything that you read from the client to the server, and vice versa.

RFC 2616 treats CONNECT as a a way to establish a simple tunnel (which it is). There is more about it in RFC 2817, although the rest of RFC 2817 (upgrades to TLS within a non-proxy HTTP connection) is rarely used.

It looks like what you're trying to do is to have the connection between the client (C) and the proxy (P) over TLS. That's fine, but the client won't use CONNECT to connect to external web servers (unless it's a connection to an HTTPS server too).

回答2:

From RFC 2616 (section 14.23):

The Host request-header field specifies the Internet host and port number of the resource being requested, as obtained from the original URI given by the user or referring resource (generally an HTTP URL, as described in section 3.2.2). The Host field value MUST represent the naming authority of the origin server or gateway given by the original URL.

My understanding is that you need to copy the address from CONNECT line to HOST line. All in all, the address of the resource is 192.168.1.1, and the fact that you are connecting via example.com doesn't change anything from RFC point of view.

回答3:

You're doing everything right. It's Apache that got things wrong. Support for CONNECT over TLS was only added recently (https://issues.apache.org/bugzilla/show_bug.cgi?id=29744) and there's still some things to be ironed out. The issue you're hitting is one of them.

回答4:

It is quite seldom to see CONNECT Method inside TLS (https). I actually don't know any client who does that (and I would be interested to know who it does, cause I think it is actually a good feature).

Normally the client connects with http (plain tcp) to the proxy and sends the CONNECT method (and host header) to host:443. Then the proxy will make a transparent connection to the endpoint and then the client sends the SSL handshake through.

In this scenario the data is ssl protected "end to end".

The CONNECT method is not really specified, it is only reserved in the HTTP RFC. But typically it is quite simple so it is interoperable. The Method specifies host[:port]. Host: header can simply be ignored. Some additional proxy authentication headers might be needed. When the body of the connection begins no parsing has to happen by the proxy anymore (some do, because they check for valid SSL handshake).