I have about six Sidekiq worker which perform JSON crawling. Dependent on the endpoint's dataset size they finish between 1min and 4h. Especially, watching the long one, which takes 4h, I see a very slight increase of memory over time.
It's not a problem, until I want to schedule the same worker jobs again. The memory is not deallocated and stacks up, until I run into the Linux OOM Killer which gets rid of my Sidekiq process.
Memory leak? I watched the number of different objects in ObjectSpace:
ObjectSpace.each_object.inject(Hash.new(0)) { |count, o| count[o.class] += 1 }
There is not really an increase there, the set of hashes, arrays, etc. stays the same, short increases are swept away by the Garbage Collector and gc.stat[:count]
tells me, that the Garbage Collector is working, too.
Even after the worker finishes, e.g. I get the [Done] logged and no workers are busy any more, the memory is not deallocated. What are the reasons for that? Can I do something against this? Write a finalizer?
The only current solution: Restart the Sidekiq process.
I am on Ruby 2.0.0 and use Ruby MRI.
For the JSON parsing I use Yajl, thus a C binding. I need it because it seems the only fast JSON parser that properly implements streamed reading and writing.