Parallel R on a Windows cluster

2019-01-20 08:52发布

问题:

I've got a Windows HPC Server running with some nodes in the backend. I would like to run Parallel R using multiple nodes from the backend. I think Parallel R might be using SNOW on Windows, but not too sure about it. My question is, do I need to install R also on the backend nodes? Say I want to use two nodes, 32 cores per node:

cl <- makeCluster(c(rep("COMP01",32),rep("COMP02",32)),type="SOCK")

Right now, it just hangs.

What else do I need to do? Do the backend nodes need some kind of sshd running to be able to communicate each other?

回答1:

Setting up snow on a Windows cluster is rather difficult. Each of the machines needs to have R and snow installed, but that's the easy part. To start a SOCK cluster, you would need an sshd daemon running on each of the worker machines, but you can still run into troubles, so I wouldn't recommend it unless you're good at debugging and Windows system administration.

I think your best option on a Windows cluster is to use MPI. I don't have any experience with MPI on Windows myself, but I've heard of people having success with the MPICH and DeinoMPI MPI distributions for Windows. Once MPI is installed on your cluster, you also need to install the Rmpi package from source on each of your worker machines. You would then create the cluster object using the makeMPIcluster function. It's a lot of work, but I think it's more likely to eventually work than trying to use a SOCK cluster due to the problems with ssh/sshd on Windows.

If you're desperate to run a parallel job once or twice on a Windows cluster, you could try using manual mode. It allows you to create a SOCK cluster without ssh:

workers <- c(rep("COMP01",32), rep("COMP02",32))
cl <- makeSOCKluster(workers, manual=TRUE)

The makeSOCKcluster function will prompt you to start each one of the workers, displaying the command to use for each. You have to manually open a command window on the specified machine and execute the specified command. It can be extremely tedious, particularly with many workers, but at least it's not complicated or tricky. It can also be very useful for debugging in combination with the outfile='' option.