I have a (large) neural net being trained by the nnet package in R. I want to be able to simulate predictions from this neural net, and do so in a parallelised fashion using something like foreach, which I've used before with success (all on a Windows machine).
My code is essentially of the form
library(nnet)
data = data.frame(out=c(0, 0.1, 0.4, 0.6),
in1=c(1, 2, 3, 4),
in2=c(10, 4, 2, 6))
net = nnet(out ~ in1 + in2, data=data, size=5)
library(doParallel)
registerDoParallel(cores=detectCores()-2)
results = foreach(test=1:10, .combine=rbind, .packages=c("nnet")) %dopar% {
result = predict(net, newdata = data.frame(in1=test, in2=5))
return(result)
}
except with a much larger NN being fit and predicted from; it's around 300MB.
The code above runs fine when using a traditional for loop, or when using %do%, but when using %dopar%, everything gets loaded into memory for each core being used - around 700MB each. If I run it for long enough, everything eventually explodes.
Having looked up similar problems, I still have no idea what is causing this. Omitting the 'predict' part has everything run smoothly.
How can I have each core lookup the unchanging 'net' rather than having it loaded into memory? Or is it not possible?
How can I have each core lookup the unchanging 'net' rather than having it loaded into memory? Or is it not possible?
CPak's reply explains what's going on; you're effectively running multiple copies (=workers) of the main script in separate R session. Since you're on Windows, calling
registerDoParallel(cores = n)
expands to:
cl <- parallel::makeCluster(n, type = "PSOCK")
registerDoParallel(cl)
which what sets up n
independent background R workers with their own indenpendent memory address spaces.
Now, if you'd been on a Unix-like system, it would instead have corresponded to using n
forked R workers, cf. parallel::mclapply()
. Forked processes are not supported by R on Windows. With forked processing, you would effectively get what you're asking for, because forked child processes will share the objects already allocated by the main process (as long as such objects are not modified), e.g. net
.
When you start new parallel workers, you're essentially creating a new environment, which means that whatever operations you perform in that new environment will require access to the relevant variables/functions.
For instance, you have to specify .packages=c("nnet")
because you require the nnet
package within each new worker (environment), and this is how you "clone" or "export" from the global environment to each worker env.
Because you require the trained neural network to make predictions, you will need to export it to each worker as well, and I don't see a way around the memory blowup you're experiencing. If you're still interested in parallelization but are running out of memory, my only advice is to look into doMPI
.