Mechanize/Ruby read source code of 404 page

2019-04-12 15:45发布

问题:

All I'm doing is loading mechanize, and getting a page that returns 404. But that's exactly what I want. The 404 page has plenty of html I'd like to use in my example.

a = mechanize.new
a.get('http://www.youtube.com/watch?v=e4g8jriw4rg')
a.page
=> nil

I can't seem to find any further info on this.

回答1:

You need to handle the exception:

begin
  page = a.get 'http://www.youtube.com/watch?v=e4g8jriw4rg'
rescue Mechanize::ResponseCodeError => e
  puts e.response_code # the status code as a string, e.g. "404"
  page = e.page
end

puts page.title


回答2:

This may have been the case when the answer was written (the code changed about 5 years ago) but it's no longer the case. You can now set allowed_error_codes on the agent instance to an array of Integers or Strings with the values set to the HTTP Response codes you wish to handle without an exception. The docs (as of writing this) note that "2xx, 3xx and 401 status codes will be handled without checking this list."