I'm running a Bayesian MCMC probit model, and I'm trying to implement it in parallel. I'm getting confusing results about the performance of my machine when comparing parallel to serial. I don't have a lot of experience doing parallel processing, so it is possible I'm not doing it right.
I'm using MCMCprobit
in the MCMCpack
package for the probit model, and for parallel processing I'm using parLapply
in the parallel
package.
Here's my code for the serial run, and the results from system.time
:
system.time(serial<-MCMCprobit(formula=econ_model,data=mydata,mcmc=10000,burnin=100))
user system elapsed
657.36 73.69 737.82
Here's my code for the parallel run:
#Setting up the functions for parLapply:
probit_modeling <- function(...) {
args <- list(...)
library(MCMCpack)
MCMCprobit(formula=args$model, data=args$data, burnin=args$burnin, mcmc=args$mcmc, thin=1)
}
probit_Parallel <- function(mc, model, data,burnin,mcmc) {
cl <- makeCluster(mc)
## To make this reproducible:
clusterSetRNGStream(cl, 123)
library(MCMCpack) # needed for c() method on master
probit.res <- do.call(c, parLapply(cl, seq_len(mc), probit_modeling, model=model, data=data,
mcmc=mcmc,burnin=burnin))
stopCluster(cl)
return(probit.res)
}
system.time(test<-probit_Parallel(model=econ_model,data=mydata,mcmc=10000,burnin=100,mc=2))
And the results from system.time
:
user system elapsed
0.26 0.53 1097.25
Any ideas why user and system times would be so much shorter for the parallel process, but the elapsed time so much longer? I tried it at shorter MCMC runs (100 and 1000), and the story is the same. I'm assuming I'm making a mistake somewhere.
Here are my computer specifications:
- R 3.1.3
- 8 GB memory
- Windows 7 64 bit
- Intel Core i5 2520M CPU, dual core
It appears to me that both of the workers are doing as much work as is performed in the sequential version. The workers should only perform a fraction of the total work in order to execute faster than the sequential version of the code. That might be accomplished by dividing
mcmc
by the number of workers in this example, although that may not be what you real want to do.I think that explains the long elapsed time reported by
system.time
. The "user" and "system" times are short because they are times for the master process which uses very little CPU time when executingparLapply
: the real CPU time is used by the workers which isn't being reported bysystem.time
.