可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

It seems like the methods of Ruby's Net::HTTP are all or nothing when it comes to reading the body of a web page. How can I read, say, the just the first 100 bytes of the body?

I am trying to read from a content server that returns a short error message in the body of the response if the file requested isn't available. I need to read enough of the body to determine whether the file is there. The files are huge, so I don't want to get the whole body just to check if the file is available.

回答1:

This is an old thread, but the question of how to read only a portion of a file via HTTP in Ruby is still a mostly unanswered one according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:

require 'net/http'

# provide access to the actual socket
class Net::HTTPResponse
  attr_reader :socket
end

uri = URI("http://www.example.com/path/to/file")
begin
  Net::HTTP.start(uri.host, uri.port) do |http|
    request = Net::HTTP::Get.new(uri.request_uri)
    # calling request with a block prevents body from being read
    http.request(request) do |response|
      # do whatever limited reading you want to do with the socket
      x = response.socket.read(100);
      # be sure to call finish before exiting the block
      http.finish
    end
  end
rescue IOError
  # ignore
end

The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.

FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:

class Net::BufferedIO
  def readchar
    read(1)[0].ord
  end
end

回答2:

Shouldn't you just use an HTTP HEAD request (Ruby Net::HTTP::Head method) to see if the resource is there, and only proceed if you get a 2xx or 3xx response? This presumes your server is configured to return a 4xx error code if the document is not available. I would argue this was the correct solution.

An alternative is to request the HTTP head and look at the content-length header value in the result: if your server is correctly configured, you should easily be able to tell the difference in length between a short message and a long document. Another alternative: set the content-range header field in the request (which again assumes that the server is behaving correctly WRT the HTTP spec).

I don't think that solving the problem in the client after you've sent the GET request is the way to go: by that time, the network has done the heavy lifting, and you won't really save any wasted resources.

Reference: http header definitions

回答3:

I wanted to do this once, and the only thing that I could think of is monkey patching the Net::HTTP#read_body and Net::HTTP#read_body_0 methods to accept a length parameter, and then in the former just pass the length parameter to the read_body_0 method, where you can read only as much as length bytes.

回答4:

To read the body of an HTTP request in chunks, you'll need to use Net::HTTPResponse#read_body like this:

http.request_get('/large_resource') do |response|
  response.read_body do |segment|
    print segment
  end
end

回答5:

Are you sure the content server only returns a short error page?

Doesn't it also set the HTTPResponse to something appropriate like 404. In which case you can trap the HTTPClientError derived exception (most likely HTTPNotFound) which is raised when accessing Net::HTTP.value().

If you get an error then your file wasn't there if you get 200 the file is starting to download and you can close the connection.

回答6:

You can't. But why do you need to? Surely if the page just says that the file isn't available then it won't be a huge page (i.e. by definition, the file won't be there)?