Connection pool issue with ActiveRecord objects in

2019-01-17 16:51发布

问题:

I'm using rufus-scheduler to run a number of frequent jobs that do some various tasks with ActiveRecord objects. If there is any sort of network or postgresql hiccup, even after recovery, all the threads will throw the following error until the process is restarted:

ActiveRecord::ConnectionTimeoutError (could not obtain a database connection within 5 seconds (waited 5.000122687 seconds). The max pool size is currently 5; consider increasing it.

The error can easily be reproduced by restarting postgres. I've tried playing (up to 15) with the pool size, but no luck there.

That leads me to believe the connections are just in a stale state, which I thought would be fixed with the call to clear_stale_cached_connections!.

Is there a more reliable pattern to do this?

The block that is passed is a simple select and update active record call, and happens to matter what the AR object is.

The rufus job:

scheduler.every '5s' do
  db do
    DataFeed.update  #standard AR select/update
  end
end

wrapper:

  def db(&block)
    begin
      ActiveRecord::Base.connection_pool.clear_stale_cached_connections!
      #ActiveRecord::Base.establish_connection    # this didn't help either way
      yield block
    rescue Exception => e
      raise e
    ensure
      ActiveRecord::Base.connection.close if ActiveRecord::Base.connection
      ActiveRecord::Base.clear_active_connections!
    end
  end

回答1:

Rufus scheduler starts a new thread for every job. ActiveRecord on the other hand cannot share connections between threads, so it needs to assign a connection to a specific thread.

When your thread doesn't have a connection yet, it will get one from the pool. (If all connections in the pool are in use, it will wait untill one is returned from another thread. Eventually timing out and throwing ConnectionTimeoutError)

It is your responsibility to return it back to the pool when you are done with it, in a Rails app, this is done automatically. But if you are managing your own threads (as rufus does), you have to do this yourself.

Lucklily, there is an api for this: If you put your code inside a with_connection block, it will get a connection form the pool, and release it when it is done

ActiveRecord::Base.connection_pool.with_connection do
  #your code here
end

In your case:

def db
  ActiveRecord::Base.connection_pool.with_connection do
    yield
  end
end

Should do the trick....

http://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html#method-i-with_connection



回答2:

The reason can be that you have many threads which are using all connections, if DataFeed.update method takes more than 5 seconds, than your block can be overlapped.

try

scheduler.every("5s",  :allow_overlapping => false) do
#...
end

Also try release connection instead of closing it.

 ActiveRecord::Base.connection_pool.release_connection


回答3:

I don't really know about rufus-scheduler, but I got some ideas.

The first problem could be a bug on rufus-scheduler that does not checkout database connection properly. If it's the case the only solution is to clear stale connections manually as you already do and to inform the author of rufus-scheduler about your issue.

Another problem that could happen is that your DataFeed operation takes a really long time and because it is performed every 5 secondes Rails is running out of database connections, but it's rather unlikely.