Parallelization in R: %dopar% vs %do%. Why using a

2020-06-04 05:15发布

问题:

I'm experiencing a weird behaviour in my computer when distributing processes among its cores using doMC and foreach. Does someone knows why using single core I got better performance than using 2 cores? As you can see, processing the same code without register any core (which supposedly use only 1 core) yields to a much more time-efficiency processing. While %do% seems to perform better than %dopar%, registering 2 cores out of 4 yield to more time consuming.

require(foreach)
require(doMC)
# 1-core
> system.time(m <- foreach(i=1:100) %dopar% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  9.285   1.895  11.083 
> system.time(m <- foreach(i=1:100) %do% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  9.139   1.879  10.979 

# 2-core
> registerDoMC(cores=2)
> system.time(m <- foreach(i=1:100) %dopar% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  3.322   3.737 132.027
> system.time(m <- foreach(i=1:100) %do% 
+ matrix(rnorm(1000*1000), ncol=5000) )
   user  system elapsed 
  9.744   2.054  11.740 

Using 4 cores in few trials yield to very different outcomes:

> registerDoMC(cores=4)
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 11.522   4.082  24.444 
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 21.388   6.299  25.437 
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 17.439   5.250   9.300 
> system.time(m <- foreach(i=1:100) %dopar% 
{ matrix(rnorm(1000*1000), ncol=5000) } )
   user  system elapsed 
 17.480   5.264   9.170

回答1:

It's the combination of results that eats all the processing time. These are the timings on my machine for the cores=2 scenario if no results are returned. It's essentially the same code, only the created matrices are discarded instead of being returned:

> system.time(m <- foreach(i=1:100) %do% 
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
   user  system elapsed 
 13.793   0.376  14.197 
> system.time(m <- foreach(i=1:100) %dopar% 
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
   user  system elapsed 
  8.057   5.236   9.970 

Still not optimal, but at least the parallel version is now faster.

This is from documentation of doMC:

The doMC package provides a parallel backend for the foreach/%dopar% function using the multicore functionality of the parallel package.

Now, parallel uses a fork mechanism to spawn identical copies of the R process. Collecting results from separate processes is an expensive task, and this is what you see in your time measurements.