Help with HTTP Intercepting Proxy in Ruby?

2019-03-22 04:58发布

问题:

I have the beginnings of an HTTP Intercepting Proxy written in Ruby:

require 'socket'                # Get sockets from stdlib

server = TCPServer.open(8080)   # Socket to listen on port 8080
loop {                          # Servers run forever
  Thread.start(server.accept) do |client|
    puts "** Got connection!"
    @output = ""
    @host = ""
    @port = 80
    while line = client.gets
        line.chomp!
        if (line =~ /^(GET|CONNECT) .*(\.com|\.net):(.*) (HTTP\/1.1|HTTP\/1.0)$/)
            @port = $3
        elsif (line =~ /^Host: (.*)$/ && @host == "")
            @host = $1
        end
        print line + "\n"
        @output += line + "\n"
        # This *may* cause problems with not getting full requests, 
        # but without this, the loop never returns.
        break if line == ""
    end
    if (@host != "")
        puts "** Got host! (#{@host}:#{@port})"
        out = TCPSocket.open(@host, @port)
        puts "** Got destination!"
        out.print(@output)
        while line = out.gets
            line.chomp!
            if (line =~ /^<proxyinfo>.*<\/proxyinfo>$/)
                # Logic is done here.
            end
            print line + "\n"
            client.print(line + "\n")
        end
        out.close
    end
    client.close
  end
}

This simple proxy that I made parses the destination out of the HTTP request, then reads the HTTP response and performs logic based on special HTML tags. The proxy works for the most part, but seems to have trouble dealing with binary data and HTTPS connections.

How can I fix these problems?

回答1:

First, you would probably be better off building on an existing Ruby HTTP proxy implementation. One such is already available in the Ruby standard library, namely WEBrick::HTTPProxyServer. See for example this related question for an implementation based on that same class: Webrick transparent proxy.

Regarding proxying HTTPS, you can't do much more than just pass the raw bytes. As HTTPS is cryptographically protected, you cannot inspect the contents at the HTTP protocol level. It is just an opaque stream of bytes.



回答2:

WEBrick is blocking I/O ... This mean it does not able to stream the response. For example if you go on a youtube page to see a video, the stream will not be forwarded to your browser until the proxy have downloaded all the video cotent. If you want the video be played in your browser during it download, you have to look for a non blocking I/O solution like EventMachine. For HTTPS the solution is a little bit complicated since you have to develop a man in the middle proxy.



回答3:

This was an old question, but for the sake of completeness here goes another answer.

I've implemented a HTTP/HTTPS interception proxy in Ruby, the project is hosted in github.

The HTTP case is obvious, HTTPS interception in accomplished via an HTTPS server that acts as a reverse proxy (and handles the TLS handshake). I.e.

Client(e.g. Browser) <--> Proxy1 <--> HTTPS Reverse Proxy <--> Target Server

As Valko mentioned, when a client connects to a HTTPS server through a proxy, you'll see a stream of encrypted bytes (since SSL provides end-to-end encryption). But not everything is encrypted, the proxy needs to know to whom the stream of bytes should be forwarded, so the client issues a CONNECT host:port request (being the body of the request the SSL stream).

The trick here is that the first proxy will forward this request to the HTTPS Reverse Proxy instead of the real target server. This reverse proxy will handle the SSL negotiation with the client, have access to the decrypted requests, and send copies (optionally altered versions) of these requests to the real target server by acting as a normal client. It will get the responses from the target server, (optionally) alter the responses, and send them back to the client.