When an out-of-memory error is raised in a parfor
, is there any way to kill only one Matlab slave to free some memory instead of having the entire script terminate?
Here is what happens by default when an out-of-memory error occurs in a parfor
: the script terminated, as shown in the screenshot below.
I wish there was a way to just kill one slave (i.e. removing a worker from parpool
) or stop using it to release as much memory as possible from it:
After quite bit of research, and a lot of trial and error, I think I may have a decent, compact answer. What you're going to do is:
memory
, but I like to set it directly.memory
inside yourparfor
loop, which returns the memory information for that particular worker.parfor
, you'll either need todelete
orcancel
either the task or worker. I've verified that it works with the code below when there is one task per worker, on a remote cluster.Insert the following code at the beginning of your
parfor
contents. Tweak as necessary.Enjoy! (Fun question, by the way.)
One other option to consider is that since R2013b, you can open a parallel pool with
'SpmdEnabled'
set tofalse
- this allows MATLAB worker processes to die without the whole pool being shut down - see the doc here http://www.mathworks.co.uk/help/distcomp/parpool.html . Of course, you still need to arrange somehow to shutdown the workers.If you get a out of memory in the master process there is no chance to fix this. For out of memory on the slave, this should do it:
The simple idea of the code: Restart the parfor again and again with the missing data until you get all results. If one iteration fails, a flag (file) is written which let's all iterations throw an error as soon as the first error occurred. This way we get "out of the loop" without wasting time producing other out of memory.