I'm trying to learn ruby, so I'm following an exercise of google dev. I'm trying to parse some links. In the case of successful redirection (considering that I know that it its possible only to get redirected once), I get redirect forbidden. I noticed that I go from a http protocol link to an https protocol link. Any concrete idea how could I implement in this in ruby because google's exercise is for python?
error:
ruby fix.rb
redirection forbidden: http://code.google.com/edu/languages/google-python-class/images/puzzle/p-bija-baei.jpg -> https://developers.google.com/edu/python/images/puzzle/p-bija-baei.jpg?csw=1
code that should achieve what I'm looking for:
def acquireData(urls, imgs) #List item urls list of valid urls !checked, imgs list of the imgs I'll download afterwards.
begin
urls.each do |url|
page = Nokogiri::HTML(open(url))
puts page.body
end
rescue Exception => e
puts e
end
end
Basically the url in code.google that you're trying to open redirects to a https url. You can see that by yourself if you paste
http://code.google.com/edu/languages/google-python-class/images/puzzle/p-bija-baei.jpg
into your browserCheck the following bug report that explains why open-uri can't redirect to https;
So the solution to your problem is simply: use a different set of urls (that don't redirect to https)
Ruby's OpenURI will automatically handle redirects for you, as long as they're not "meta-refresh" that occur inside the HTML itself.
For instance, this follows a redirect automatically:
In other words, the request to "www.example.org" got redirected to "www.iana.org" and OpenURI tracked it correctly.
If you are trying to learn HOW to handle redirects, read the Net::HTTP documentation. Here is the example how to do it from the document:
If you want to handle meta-refresh statements, reflect on this:
Which outputs: