Create dynamic sitemap from URL with Ruby on Rails

2019-08-29 04:33发布

I am currently working on an application where I scrape information from a number of different sites. To get the deeplink for the desired topic on a site I rely on the sitemap that is provided (e.g. "Forum"). As I am expanding I came across some sites that don't provide a sitemap themselves, so I was wondering if there was any way to generate it within Rails from the top level domain?

I am using Nokogiri and Mechanize to retrieve data, so if there is any functionality that could help to tackle that task it would be easier to integrate.

1条回答
Juvenile、少年°
2楼-- · 2019-08-29 05:00

This can be done with the Spidr gem like so:

url_map = Hash.new { |hash,key| hash[key] = [] }

Spidr.site('http://intranet.com/') do |spider|
  spider.every_link do |origin,dest|
    url_map[dest] << origin
  end
end
查看更多
登录 后发表回答