I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?
http://yoursite/page/38475 #=> page number 38475 doesn't exist
I tried the following which didn't work.
url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
begin
rescue Exception => e
puts "Try again later"
end
end
It doesn't work, because you are not rescuing part of code (it's open(url)
call) that raises an error in case of finding 404 status. The following code should work:
url = 'http://yoursite/page/38475'
begin
file = open(url)
doc = Nokogiri::HTML(file) do
# handle doc
end
rescue OpenURI::HTTPError => e
if e.message == '404 Not Found'
# handle 404 error
else
raise e
end
end
BTW, about rescuing Exception
:
Why is it a bad style to `rescue Exception => e` in Ruby?