I have 50 sidekiq threads crawling the web, and a few weeks ago the threads started hanging after about 20 minutes of running. When I do a backtrace dump, most of the threads are stuck on net/http initialize:
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `initialize'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `open'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:878:in `connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb:858:in `start'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for'
/app/vendor/bundle/ruby/2.1.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in `fetch'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:974:in `response_redirect'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:298:in `fetch'
/app/vendor/bundle/ruby/2.1.0/gems/mechanize-2.7.2/lib/mechanize.rb:432:in `get'
/app/app/workers/crawl_page.rb:24:in `block in perform'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `block in catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35:in `catch'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:106:in `timeout'
I didn't think sidekiq would get stuck on net/http because I've wrapped the entire call in a timeout: Timeout::timeout(APP_CONFIG['crawl_page_timeout']) { @page = agent.get(url) }
...but then I started reading some old posts about how ruby's Timeout is not thread safe: http://blog.headius.com/2008/02/rubys-threadraise-threadkill-timeoutrb.html
Is ruby's Timeout still not thread safe?
I know a lot of people write crawlers in Ruby. If Timeout isn't thread-safe, how are people writing crawlers handling the issue of net/http getting stuck?
Update:
I've switched to HTTPClient (which specifically says its thread safe) to replace mechanize. We appear to still be getting stuck on initializing a thread. Again, this could be due to ruby'ss Timeout not working properly, or it could be a sidekiq issue. Here's the stacktrace from the most recent hung sidekiq threads:
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `initialize'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `new'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:805:in `create_socket'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:752:in `block in connect'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91:in `block in timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `call'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:101:in `timeout'
/app/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:127:in `timeout'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:751:in `connect'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:609:in `query'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient/session.rb:164:in `query'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:1087:in `do_get_block'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:34:in `block in do_get_block_with_newrelic'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/cross_app_tracing.rb:43:in `tl_trace_http_request'
/app/vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.2.239/lib/new_relic/agent/instrumentation/httpclient.rb:33:in `do_get_block_with_newrelic'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:891:in `block in do_request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:985:in `protect_keep_alive_disconnected'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:890:in `do_request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:963:in `follow_redirect'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:776:in `request'
/app/vendor/bundle/ruby/2.1.0/gems/httpclient-2.4.0/lib/httpclient.rb:677:in `get'
/app/app/ohm_models/queued_page.rb:20:in `run_crawl'