I'm not a Java programmer at all. I try to avoid it at all costs actually, but it is required that I use it for a class (in the school sense). The teacher requires that we use Socket(), BufferedReader(), PrintWriter() and various other things including BufferedReader()'s readLine() method.
Basically, this is the problem I'm having. The documentation clearly states that readLine should return a null at the end of the input stream, but that's not what's happening.
Socket link = new Socket(this.address, 80);
BufferedReader in = new BufferedReader( new InputStreamReader( link.getInputStream() ));
PrintWriter out = new PrintWriter( new PrintWriter( link.getOutputStream(), true ));
out.print("GET blah blah blah"); // http request by hand
out.flush(); // send the get please
while( (s=in.readLine()) != null ) {
// prints the html correctly, hooray!!
System.out.println(s);
}
Instead of finishing at the end of the HTML, I get a blank line, a 0 and
another blank line and then the next in.readLine() hangs forever. Why?
Where's my null?
I tried out.close() to see if maybe Yahoo! was doing a persistent http
session or something (which I don't think it would without the header that
we're willing to do it).
All the Java sockets examples I'm finding on the net seem to indicate the
while loop is the correct form. I just don't know enough Java to debug
this.
Your problem is the content encoding “chunked”. This is used when the length of the content requested from the web server is not known at the time the response is started. It basically consists of the number of bytes being sent, followed by CRLF
, followed by the bytes. The end of a response is signalled by the exact sequence you are seeing. The web server is now waiting for your next request (this is also called “request pipelining”).
You have several possibilities:
- Use HTTP version 1.0. This will cause the webserver to automatically close the connection when a response has been sent completely.
- Specify the “Connection: close” header when sending your request. This will also close the connection.
- Parse content encoding “chunked” correctly and simply treat this as if the response is now complete—which it is.
So you're reading from a socket (you don't show that in your code, but that's what I gather from the text)?
As long as the other side is not closing the connection, Java doesn't know that it's at the end of the input, so readLine()
is waiting for the other side to send more data and doesn't return null
.
Try GET url HTTP/1.0
. The HTTP/1.0
tells the server that you can't handle more than a single document per connection. In this case, the server should close the connection after sending you the result.
Your HTTP request is not complete without 2 carriage return + linefeed pairs. You should probably also call close after the request is sent:
out.print("GET /index.html HTTP/1.0\r\n");
// maybe print optional headers here
// empty line
out.print("\r\n");
out.flush();
out.close();