system.time and parallel package in R sys.child is

2019-09-14 20:01发布

I would like to use system.time in R to get the total CPU time on a multicore function. The problem is that system.time does obviously not capture CPU time spend by the child processes spawned by the parallel package.

library(doParallel)
cl <- makeCluster(2)
registerDoParalllel(2)
timings <- system.time(foreach(i = 1:2) %do% rnorm(1e8))

Timings then looks like this

> timings
   user  system elapsed 
 16.883   5.731  22.899 

The timings add up. Now if I use parallel processing:

timings <- system.time(foreach(i = 1:2) %dopar% rnorm(1e8))
> timings
   user  system elapsed 
  2.445   3.410  20.347 

The user and system time are only capturing the master process. Specifically looking at the timings[4] and [5] shows me that the user.child and sys.child times are 0.

What do I have to do to measure total CPU time in R on parallel processing?

Note: Moving the cluster startup code into the system.time call did not make a difference.

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

other attached packages:
[1] doParallel_1.0.10 iterators_1.0.8   foreach_1.4.3    

1条回答
放我归山
2楼-- · 2019-09-14 20:19

@chinsoon12 pointed me in the right direction. user.child and sys.child are only populated when the cluster is created by registerDoParallel, e.g.

registerDoParalllel(cores = 2)
timings <- system.time(foreach(i = 1:2) %dopar% rnorm(1e8))

        user.self sys.self elapsed user.child sys.child
timings     0.429    1.978  19.378      9.818     1.386

This is why it worked out of the box with doMC where I did not manually start and stop the cluster via the cl variable.

查看更多
登录 后发表回答