Problem Description:
I have a big matrix c
, loaded in RAM memory. My goal is through parallel processing to have read only access to it. However when I create the connections either I use doSNOW
, doMPI
, big.matrix
, etc the amount to ram used increases dramatically.
Is there a way to properly create a shared memory, where all the processes may read from, without creating a local copy of all the data?
Example:
libs<-function(libraries){# Installs missing libraries and then load them
for (lib in libraries){
if( !is.element(lib, .packages(all.available = TRUE)) ) {
install.packages(lib)
}
library(lib,character.only = TRUE)
}
}
libra<-list("foreach","parallel","doSNOW","bigmemory")
libs(libra)
#create a matrix of size 1GB aproximatelly
c<-matrix(runif(10000^2),10000,10000)
#convert it to bigmatrix
x<-as.big.matrix(c)
# get a description of the matrix
mdesc <- describe(x)
# Create the required connections
cl <- makeCluster(detectCores ())
registerDoSNOW(cl)
out<-foreach(linID = 1:10, .combine=c) %dopar% {
#load bigmemory
require(bigmemory)
# attach the matrix via shared memory??
m <- attach.big.matrix(mdesc)
#dummy expression to test data aquisition
c<-m[1,1]
}
closeAllConnections()
RAM:
in the image above, you may find that the memory increases a lot until foreach
ends and it is freed.
Alternatively, if you are on Linux/Mac and you want a CoW shared memory, use forks. First load all your data into the main thread, and then launch working threads (forks) with general function
mcparallel
from theparallel
package.You can collect their results with
mccollect
or with the use of truly shared memory using theRdsm
library, like this:You can confirm, that the value really gets updated in backgruound, if you delay the write:
To control for concurency and avoid race conditions use locks:
Edit:
I simplified dependencies a bit by exchanging
Rdsm::mgrmakevar
intobigmemory::big.matrix
.mgrmakevar
internally callsbig.matrix
anyway, and we don't need anything more.I think the solution to the problem can be seen from the post of Steve Weston, the author of the
foreach
package, here. There he states:So I think the problem is that in your code your big matrix
c
is referenced in the assignmentc<-m[1,1]
. Just tryxyz <- m[1,1]
instead and see what happens.Here is an example with a file-backed
big.matrix
: