I have a rake task that processes a set of records and saves it in
another collection:
batch = []
Record.where(:type => 'a').each do |r|
batch << make_score(r)
if batch.size %100 == 0
Score.collection.insert(batch)
batch = []
end
end
I'm processing about 100K records at a time. Unfortunately at 20 minutes, I get a Query response returned CURSOR_NOT_FOUND
error.
The mongodb faq says to use skip
and limit
or turn off timeouts, using them the all thing was about ~2-3 times slower.
How can I turn off timeouts in conjunction with mongoid?
The MongoDB docs say you can pass in a timeout boolean, and it timeout is false, it will never timeout
collection.find({"type" => "a"}, {:timeout=>false})
In your case:
Record.collection.find({:type=>'a'}, :timeout => false).each ...
I also recommend you look into map-reduced with Mongo. It seems tailer made to this sort of collection array manipulation: http://www.mongodb.org/display/DOCS/MapReduce
In mongoid 3 you can use this:
ModelName.all.no_timeout.each do |m|
"do something with model"
end
Which is pretty handy.
It does seem, for now at least, you have to go the long route and query via the Mongo driver:
Mongoid.database[collection.name].find({ a_query }, { :timeout => false }) do |cursor|
cursor.each do |row|
do_stuff
end
end
Here is the workaround I did. Create an array to hold the full records, and work from that array like this
products = []
Product.all.each do |p|
products << p
end
products.each do |p|
# Do your magic
end
dumping all records into the array will most likely finish within before the timeout, unless you are working on extremely large number of records. Also, this is going to consume too much memory in case you are dealing with large or too many records as well, so keep in that mind.