What is your recommendation of writing a web crawler in Ruby? Any lib better than mechanize?
相关问题
- How to specify memcache server to Rack::Session::M
- Why am I getting a “C compiler cannot create execu
- reference to a method?
- ruby 1.9 wrong file encoding on windows
- gem cleanup shows error: Unable to uninstall bundl
相关文章
- Ruby using wrong version of openssl
- Difference between Thread#run and Thread#wakeup?
- how to call a active record named scope with a str
- “No explicit conversion of Symbol into String” for
- Segmentation fault with ruby 2.0.0p247 leading to
- How to detect if an element exists in Watir
- uninitialized constant Mysql2::Client::SECURE_CONN
- ruby - simplify string multiply concatenation
If you want just to get pages' content, the simpliest way is to use
open-uri
functions. They don't require additional gems. You just have torequire 'open-uri'
and... http://ruby-doc.org/stdlib-2.2.2/libdoc/open-uri/rdoc/OpenURI.htmlTo parse content you can use Nokogiri or other gems, which also can have, for example, useful XPATH-technology. You can find other parsing libraries just here on SO.
I am working on pioneer gem which is not a spider, but a simple asynchronous crawler based on em-synchrony gem
You might want to check out wombat that is built on top of Mechanize/Nokogiri and provides a DSL (like Sinatra, for example) to parse pages. Pretty neat :)
I'd give a try to anemone. It's simple to use, especially if you have to write a simple crawler. In my opinion, It is well designed too. For example, I wrote a ruby script to search for 404 errors on my sites in a very short time.
I just released one recently called Klepto. Its got a pretty simple DSL, is built on top of capybara and has lot of cool configuration options.