We have a daemon which scans a table for dirty bits and then schedules dirty rows to a delayed_job in batches. In order to avoid a constant select from data where dirty = 1
, we set up a memcached barrier, which wraps the table scan, like
loop do # daemon
until Rails.cache.fetch("have_dirty_rows") do end
page = 1
loop do # paginate dirty rows
dirty_batch = paginate(#:select => "*",
:order => "id",
:per_page => DIRTY_GET_BATCH_SIZE,
:conditions => {:dirty => 1},
:page => page)
if dirty_batch.empty?
Rails.cache.write("have_dirty_rows",false)
break
end
...
page = page.next
end
end
Unless I add some sleep 0.0001
or such, the loop eats 100% CPU still. Is there an efficient mechanism in Ruby/Rails which will block on something like the memcached value, or which we can feed from a memcached value, so it's not polling all the time?
Active polling is BAD ! Where are the dirty bits comin from? It woul be better if this process uses a message queue mechanism (eg RabbitMQ ) to notify other processes. That something has changed in the database.