Create dynamic sitemap from URL with Ruby on Rails

2019-08-29 04:30发布

问题:

I am currently working on an application where I scrape information from a number of different sites. To get the deeplink for the desired topic on a site I rely on the sitemap that is provided (e.g. "Forum"). As I am expanding I came across some sites that don't provide a sitemap themselves, so I was wondering if there was any way to generate it within Rails from the top level domain?

I am using Nokogiri and Mechanize to retrieve data, so if there is any functionality that could help to tackle that task it would be easier to integrate.

回答1:

This can be done with the Spidr gem like so:

url_map = Hash.new { |hash,key| hash[key] = [] }

Spidr.site('http://intranet.com/') do |spider|
  spider.every_link do |origin,dest|
    url_map[dest] << origin
  end
end