My foray into parallelization continues. I initially had difficulty installing Rmpi
, but I got that going (I needed to sudo apt-get
it). I should say that I'm running a machine with Ubuntu 10.10.
I ran the same simulation as my previous question. Recall the system times for the unclustered and SNOW
SOCK cluster respectively:
> system.time(CltSim(nSims=10000, size=100))
user system elapsed
0.476 0.008 0.484
> system.time(ParCltSim(cluster=cl, nSims=10000, size=100))
user system elapsed
0.028 0.004 0.375
Now, using an MPI cluster, I get a speed slowdown relative to not clustering:
> stopCluster(cl)
> cl <- getMPIcluster()
> system.time(ParCltSim(cluster=cl, nSims=10000, size=100))
user system elapsed
0.088 0.196 0.604
Not sure whether this is useful, but here's info on the cluster created:
> cl
[[1]]
$rank
[1] 1
$RECVTAG
[1] 33
$SENDTAG
[1] 22
$comm
[1] 1
attr(,"class")
[1] "MPInode"
[[2]]
$rank
[1] 2
$RECVTAG
[1] 33
$SENDTAG
[1] 22
$comm
[1] 1
attr(,"class")
[1] "MPInode"
attr(,"class")
[1] "spawnedMPIcluster" "MPIcluster" "cluster"
Any idea about what could be going on here? Thanks for your assistance as I try out these parallelization options.
Cordially,
Charlie
It's a bit the same as with your other question : the communication between the nodes in the cluster is taking up more time than the actual function.
This can be illustrated by changing your functions :
library(snow)
cl <- makeCluster(2)
SnowSim <- function(cluster, nSims=10,n){
parSapply(cluster, 1:nSims, function(x){
Sys.sleep(n)
x
})
}
library(foreach)
library(doSNOW)
registerDoSNOW(cl)
ForSim <- function(nSims=10,n) {
foreach(i=1:nSims, .combine=c) %dopar% {
Sys.sleep(n)
i
}
}
This way we can simulate a long-calculating and a short-calculating function in different numbers of simulations. Let's take two cases, one where you have 1 sec calculation and 10 loops, and one with 1ms calculation and 10000 loops. Both should last 10sec :
> system.time(SnowSim(cl,10,1))
user system elapsed
0 0 5
> system.time(ForSim(10,1))
user system elapsed
0.03 0.00 5.03
> system.time(SnowSim(cl,10000,0.001))
user system elapsed
0.02 0.00 9.78
> system.time(ForSim(10000,0.001))
user system elapsed
10.04 0.00 19.81
Basically what you see, is that for long-calculating functions and low simulations, the parallelized versions cleanly cut the calculation time in half as expected.
Now the simulations you do are of the second case. There you see that the snow
solution doesn't really make a difference any more, and that the foreach
solution even needs twice as much. This is simply due to the overhead of communication to and between nodes, and handling of the data that gets returned. The overhead of foreach
is a lot bigger than with snow
, as shown in my answer to your previous question.
I didn't fire up my Ubuntu to try with an MPI cluster, but it's basically the same story. There are subtle differences between the different types of clusters according to time needed for communication, partly due to differences between the underlying packages.