Mechanize/Ruby read source code of 404 page

2019-04-12 15:21发布

All I'm doing is loading mechanize, and getting a page that returns 404. But that's exactly what I want. The 404 page has plenty of html I'd like to use in my example.

a = mechanize.new
a.get('http://www.youtube.com/watch?v=e4g8jriw4rg')
a.page
=> nil

I can't seem to find any further info on this.

2条回答
再贱就再见
2楼-- · 2019-04-12 16:04

You need to handle the exception:

begin
  page = a.get 'http://www.youtube.com/watch?v=e4g8jriw4rg'
rescue Mechanize::ResponseCodeError => e
  puts e.response_code # the status code as a string, e.g. "404"
  page = e.page
end

puts page.title
查看更多
劫难
3楼-- · 2019-04-12 16:13

This may have been the case when the answer was written (the code changed about 5 years ago) but it's no longer the case. You can now set allowed_error_codes on the agent instance to an array of Integers or Strings with the values set to the HTTP Response codes you wish to handle without an exception. The docs (as of writing this) note that "2xx, 3xx and 401 status codes will be handled without checking this list."

查看更多
登录 后发表回答