It seems like the methods of Ruby's Net::HTTP are all or nothing when it comes to reading the body of a web page. How can I read, say, the just the first 100 bytes of the body?
I am trying to read from a content server that returns a short error message in the body of the response if the file requested isn't available. I need to read enough of the body to determine whether the file is there. The files are huge, so I don't want to get the whole body just to check if the file is available.
You can't. But why do you need to? Surely if the page just says that the file isn't available then it won't be a huge page (i.e. by definition, the file won't be there)?
This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:
The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.
FYI, the socket within the
HTTPResponse
object isn't a trueIO
object (it's an internal class calledBufferedIO
), but it's pretty easy to monkey-patch that, too, to mimic theIO
methods you need. For example, another library I was using (exifr) needed thereadchar
method, which was easy to add:To read the body of an HTTP request in chunks, you'll need to use
Net::HTTPResponse#read_body
like this:Are you sure the content server only returns a short error page?
Doesn't it also set the
HTTPResponse
to something appropriate like 404. In which case you can trap theHTTPClientError
derived exception (most likelyHTTPNotFound
) which is raised when accessingNet::HTTP.value()
.If you get an error then your file wasn't there if you get 200 the file is starting to download and you can close the connection.
I wanted to do this once, and the only thing that I could think of is monkey patching the
Net::HTTP#read_body
andNet::HTTP#read_body_0
methods to accept a length parameter, and then in the former just pass the length parameter to theread_body_0
method, where you can read only as much as length bytes.Shouldn't you just use an HTTP
HEAD
request (RubyNet::HTTP::Head
method) to see if the resource is there, and only proceed if you get a 2xx or 3xx response? This presumes your server is configured to return a 4xx error code if the document is not available. I would argue this was the correct solution.An alternative is to request the HTTP head and look at the
content-length
header value in the result: if your server is correctly configured, you should easily be able to tell the difference in length between a short message and a long document. Another alternative: set thecontent-range
header field in the request (which again assumes that the server is behaving correctly WRT the HTTP spec).I don't think that solving the problem in the client after you've sent the GET request is the way to go: by that time, the network has done the heavy lifting, and you won't really save any wasted resources.
Reference: http header definitions