Web crawler in ruby [closed]

2019-01-31 10:08发布

What is your recommendation of writing a web crawler in Ruby? Any lib better than mechanize?

标签： ruby web-crawler

5条回答

2楼-- · 2019-01-31 10:46

If you want just to get pages' content, the simpliest way is to use open-uri functions. They don't require additional gems. You just have to require 'open-uri' and... http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI.html

To parse content you can use Nokogiri or other gems, which also can have, for example, useful XPATH-technology. You can find other parsing libraries just here on SO.

0人赞添加讨论(0) 举报

贪生不怕死

3楼-- · 2019-01-31 10:48

I am working on pioneer gem which is not a spider, but a simple asynchronous crawler based on em-synchrony gem

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2019-01-31 10:51

You might want to check out wombat that is built on top of Mechanize/Nokogiri and provides a DSL (like Sinatra, for example) to parse pages. Pretty neat :)

0人赞添加讨论(0) 举报

来，给爷笑一个

5楼-- · 2019-01-31 10:56

I'd give a try to anemone. It's simple to use, especially if you have to write a simple crawler. In my opinion, It is well designed too. For example, I wrote a ruby script to search for 404 errors on my sites in a very short time.

0人赞添加讨论(0) 举报

Luminary・发光体

6楼-- · 2019-01-31 10:57

I just released one recently called Klepto. Its got a pretty simple DSL, is built on top of capybara and has lot of cool configuration options.

0人赞添加讨论(0) 举报

Web crawler in ruby [closed]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间