HTTP 1.1 Persistent Connections using Sockets in J

2019-02-05 08:09发布

问题:

Let's say I have a java program that makes an HTTP request on a server using HTTP 1.1 and doesn't close the connection. I make one request, and read all data returned from the input stream I have bound to the socket. However, upon making a second request, I get no response from the server (or there's a problem with the stream - it doesn't provide any more input). If I make the requests in order (Request, request, read) it works fine, but (request, read, request, read) doesn't.

Could someone shed some insight onto why this might be happening? (Code snippets follow). No matter what I do, the second read loop's isr_reader.read() only ever returns -1.

try{
        connection = new Socket("SomeServer", port);
        con_out = connection.getOutputStream();
        con_in  = connection.getInputStream();
        PrintWriter out_writer = new PrintWriter(con_out, false);
        out_writer.print("GET http://somesite HTTP/1.1\r\n");
        out_writer.print("Host: thehost\r\n");
        //out_writer.print("Content-Length: 0\r\n");
        out_writer.print("\r\n");
        out_writer.flush();

        // If we were not interpreting this data as a character stream, we might need to adjust byte ordering here.
        InputStreamReader isr_reader = new InputStreamReader(con_in);
        char[] streamBuf = new char[8192];
        int amountRead;
        StringBuilder receivedData = new StringBuilder();
        while((amountRead = isr_reader.read(streamBuf)) > 0){
            receivedData.append(streamBuf, 0, amountRead);
        }

// Response is processed here.

        if(connection != null && !connection.isClosed()){
            //System.out.println("Connection Still Open...");

        out_writer.print("GET http://someSite2\r\n");
        out_writer.print("Host: somehost\r\n");
        out_writer.print("Connection: close\r\n");
        out_writer.print("\r\n");
        out_writer.flush();

        streamBuf = new char[8192];
        amountRead = 0;
        receivedData.setLength(0);
        while((amountRead = isr_reader.read(streamBuf)) > 0 || amountRead < 1){
            if (amountRead > 0)
                receivedData.append(streamBuf, 0, amountRead);
        }
}
        // Process response here
    }

Responses to questions: Yes, I'm receiving chunked responses from the server. I'm using raw sockets because of an outside restriction.

Apologies for the mess of code - I was rewriting it from memory and seem to have introduced a few bugs.

So the consensus is I have to either do (request, request, read) and let the server close the stream once I hit the end, or, if I do (request, read, request, read) stop before I hit the end of the stream so that the stream isn't closed.

回答1:

According to your code, the only time you'll even reach the statements dealing with sending the second request is when the server closes the output stream (your input stream) after receiving/responding to the first request.

The reason for that is that your code that is supposed to read only the first response

while((amountRead = isr_reader.read(streamBuf)) > 0) {
  receivedData.append(streamBuf, 0, amountRead);
}

will block until the server closes the output stream (i.e., when read returns -1) or until the read timeout on the socket elapses. In the case of the read timeout, an exception will be thrown and you won't even get to sending the second request.

The problem with HTTP responses is that they don't tell you how many bytes to read from the stream until the end of the response. This is not a big deal for HTTP 1.0 responses, because the server simply closes the connection after the response thus enabling you to obtain the response (status line + headers + body) by simply reading everything until the end of the stream.

With HTTP 1.1 persistent connections you can no longer simply read everything until the end of the stream. You first need to read the status line and the headers, line by line, and then, based on the status code and the headers (such as Content-Length) decide how many bytes to read to obtain the response body (if it's present at all). If you do the above properly, your read operations will complete before the connection is closed or a timeout happens, and you will have read exactly the response the server sent. This will enable you to send the next request and then read the second response in exactly the same manner as the first one.

P.S. Request, request, read might be "working" in the sense that your server supports request pipelining and thus, receives and processes both request, and you, as a result, read both responses into one buffer as your "first" response.

P.P.S Make sure your PrintWriter is using the US-ASCII encoding. Otherwise, depending on your system encoding, the request line and headers of your HTTP requests might be malformed (wrong encoding).



回答2:

Writing a simple http/1.1 client respecting the RFC is not such a difficult task. To solve the problem of the blocking i/o access where reading a socket in java, you must use java.nio classes. SocketChannels give the possibility to perform a non-blocking i/o access.

This is necessary to send HTTP request on a persistent connection.

Furthermore, nio classes will give better performances.

My stress test give to following results :

  • HTTP/1.0 (java.io) -> HTTP/1.0 (java.nio) = +20% faster

  • HTTP/1.0 (java.io) -> HTTP/1.1 (java.nio with persistent connection) = +110% faster



回答3:

Make sure you have a Connection: keep-alive in your request. This may be a moot point though.

What kind of response is the server returning? Are you using chunked transfer? If the server doesn't know the size of the response body, it can't provide a Content-Length header and has to close the connection at the end of the response body to indicate to the client that the content has ended. In this case, the keep-alive won't work. If you're generating content on-the-fly with PHP, JSP etc., you can enable output buffering, check the size of the accumulated body, push the Content-Length header and flush the output buffer.



回答4:

Is there a particular reason you're using raw sockets and not Java's URL Connection or Commons HTTPClient?

HTTP isn't easy to get right. I know Commons HTTP Client can re-use connections like you're trying to do.

If there isn't a specific reason for you using Sockets this is what I would recommend :)



回答5:

Writing your own correct client HTTP/1.1 implementation is nontrivial; historically most people who I've seen attempt it have got it wrong. Their implementation usually ignores the spec and just does what appears to work with one particular test server - in particular, they usually ignore the requirement to be able to handle chunked responses.

Writing your own HTTP client is probably a bad idea, unless you have some VERY strange requirements.