Parallel computing with clusters other than snow S

2020-05-19 04:06发布

问题:

The recent addition of direct support for parallel computing in R2.14 sparked a question in my mind. There are numerous options for creating clusters in R. I use snow SOCK clusters on a regular basis, but I know that there are other ways such as MPI. I use SOCK snow clusters because I do not need to install any additional software (I use Fedora 13).

So, my concrete questions:

  1. Is there a gain in performance when using non-SOCK clusters?
  2. Is it easier to create clusters on multiple computers using non-SOCK clusters?

回答1:

1) there is a limited number of benchmarks available which proof that MPI will be faster than SOCKets. But as an R user you probably will not care about these differences. They are in the area of milli seconds and the number of communications is not that high in embarrassingly parallel problems

2) Yes, you do not have to provide a list of machine names or IPs. For a computer cluster with 100 nodes this gets complicated. But everything depends on your computer cluster. In most cases MPI or PVM is already preinstalled and everything works out of the box using Rmpi, ...