Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this?
# this is a method from the eng class
def get_page(url,page_type)
begin
timeout(10) do
# Get a Nokogiri::HTML::Document for the page we’re interested in...
@@doc = Nokogiri::HTML(open(url))
end
rescue Timeout::Error
puts "Time out connection request"
raise
end
end
# this is a snippet from the main app calling eng class
# receives a hash with urls and goes throgh asking one by one
def retrieve_in_loop(links)
(0..links.length).each do |idx|
url = links[idx]
puts "Visiting link #{idx} of #{links.length}"
puts "link: #{url}"
begin
@@eng.get_page(url, product)
rescue Exception => e
puts "Error getting url: #{idx} #{url}"
puts "This link will be skeeped. Continuing with next one"
end
end
end
The
timeout
block is simply the max time that that code has to execute inside the block without triggering an exception. It does not affect anything inside Nokogiri or OpenURI.You can set the timeout to a year, but OpenURI can still time out whenever it likes.
So your problem is most likely that OpenURI is timing out on the connection attempt itself. Nokogiri has no timeouts; it's just a parser.
Adjusting read timeout
The only timeout you can adjust on OpenURI is the read timeout. It seems you cannot change the connection timeout through this method:
Adjusting connection timeout
To adjust the connection timeout you would have to go with
Net::HTTP
directly instead:You can also take a look at some additional discussion here:
Ruby Net::HTTP time out
Increase timeout for Net::HTTP