Why is the end of the input stream never reached u

2019-05-13 00:36发布

问题:

I am writing a simple proxy in Java. I am having trouble reading the entirety of a given request into a byte array. Specifically, in the following loop, the call to 'read' blocks even though the client has sent all the data that it will (that is, the end of stream is never reached). As I can't be sure that it is time to start writing output until I've read the entirety of the input, this is causing a bit of trouble. If I kill the connection to the server, the end of stream is finally reached, and everything goes off without a hitch (all of the data from the client, in this case Firefox requesting www.google.com, has been read by the server, and it is able to process it as required, though obviously it can't send anything back to the client).

public static void copyStream(InputStream is, OutputStream os) throws IOException
{
    int read = 0;
    byte[] buffer = new byte[BUFFER_SIZE];
    while((read = is.read(buffer, 0, BUFFER_SIZE)) != -1)
    {
      os.write(buffer, 0, read);
    }
    return;
}

The InputStream comes from the client socket (getInputStream(), then buffered) directly; the OutputStream is a ByteArrayOutputStream.

What am I doing wrong?

回答1:

Typically in HTTP the Content-Length header indicates how much data you're supposed to read from the stream. Basically it tells you how many bytes follow the double-newline (actually double-\r\n) that indicates the end of the HTTP headers. See W3C for more info...

If there is no Content-Length header sent, you could try interrupting the read after a certain amount of time passes with no data sent over the connection, although that's definitely not preferable.

(I'm assuming that you're going to be processing the data you're reading somehow, otherwise you could just write out each byte as you read it)



回答2:

HTTP 1.1, supported by all modern browsers, has a feature called "keep-alive", or "persistent connections", in which clients are allowed by default to reuse a HTTP 1.1 connection to a server for several requests (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html). So if you are pointing FF to http://www.google.com, the connection to www.google.com:80 will remain open for a while, even if the first request has been completed. You thus can not know if all the data has been sent without a basic understanding of HTTP protocol by your application. You can somehow circumvent that by using a timeout on the connection, hoping the client is not stuck somewhere and that silence actually means the end of the data block. An other way would be to rewrite server response headers, to advertise your proxy as HTTP 1.0 compliant, and not 1.1, thus forbidding the client to use persistent connections.



回答3:

Keep in mind that not all connections will have a Content-Length header; some may be using Transfer-Encoding: chunked where the content length is encoded and included as part of the body.