I only need to download the first few kilobytes of a file via HTTP.
I tried
require 'open-uri'
url = 'http://example.com/big-file.dat'
file = open(url)
content = file.read(limit)
But it actually downloads the full file.
I only need to download the first few kilobytes of a file via HTTP.
I tried
require 'open-uri'
url = 'http://example.com/big-file.dat'
file = open(url)
content = file.read(limit)
But it actually downloads the full file.
This seems to work when using sockets:
require 'socket'
host = "download.thinkbroadband.com"
path = "/1GB.zip" # get 1gb sample file
request = "GET #{path} HTTP/1.0\r\n\r\n"
socket = TCPSocket.open(host,80)
socket.print(request)
# find beginning of response body
buffer = ""
while !buffer.match("\r\n\r\n") do
buffer += socket.read(1)
end
response = socket.read(100) #read first 100 bytes of body
puts response
I'm curious if there is a "ruby way".
This is an old thread, but it's still a question that seems mostly unanswered according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:
require 'net/http'
# provide access to the actual socket
class Net::HTTPResponse
attr_reader :socket
end
uri = URI("http://www.example.com/path/to/file")
begin
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new(uri.request_uri)
# calling request with a block prevents body from being read
http.request(request) do |response|
# do whatever limited reading you want to do with the socket
x = response.socket.read(100);
end
end
rescue IOError
# ignore
end
The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.
FYI, the socket within the HTTPResponse
object isn't a true IO
object (it's an internal class called BufferedIO
), but it's pretty easy to monkey-patch that, too, to mimic the IO
methods you need. For example, another library I was using (exifr) needed the readchar
method, which was easy to add:
class Net::BufferedIO
def readchar
read(1)[0].ord
end
end
Check out "OpenURI returns two different objects". You might be able to abuse the methods in there to interrupt downloading/throw away the rest of the result after a preset limit.