I have a set of computational operations that need to be performed a cluster (maybe like 512 MPI processes). Right now, I have the root node on the cluster open a socket and transfer data to my local computer in between the compute operations, but I'm wondering if it's possible to just create two MPI groups, and one of those groups is my local machine, and the other the remote cluster, and to send data between them using MPI commands.
Is this possible?
Yes, it is possible, as long as there is a network path between the cluster node and your machine. The MPI standard provides the abstract mechanisms to do it, while Open MPI provides a really simple way to make the things work. You have to look into the Process Creation and Management section of the standard (Chapter 10 of MPI-2.2), and specifically into the Establishing Communication subsection (§10.4 of MPI-2.2). Basically the steps are:
MPI_Open_port()
. This MPI call returns a unique port name that then has to be published as a well-known service name usingMPI_Publish_name()
. Once the port is opened, it can be used to accept client connections by calling the blocking routineMPI_Comm_accept()
. The job has now become the server job.MPI_Lookup_name()
. Once it has the port name, it can callMPI_Comm_connect()
in order to connect to the remote server.MPI_Comm_connect()
is paired with the respectiveMPI_Comm_accept()
, both jobs will establish an intercommunicator between them and messages could then be sent back and forth.One intricate detail is how the client job could look up the port name given the service name? This is a less documented part of Open MPI, but it is quite easy: you have to provide the
mpiexec
command that you use to start the client job with the URI of thempiexec
of the server job, which acts as a sort of directory service. To do that, you should launch the server job with the--report-uri -
argument to make it print its URI to the standard output:It will give you a long URI of the form
1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
. Now you have to supply this URI to the clientmpiexec
with the--ompi-server uri
option:Note that the URI contains the addresses of all configured and enabled network interfaces that are present at the node, where the server's
mpiexec
is started. You should ensure that the client is able to reach at least one of them. Also ensure that you have the TCP BTL component in the list of enabled BTL components, otherwise no messages could flow. The TCP BTL is usually enabled by default, but on some InfiniBand installations it is explicitly disabled, either by setting the corresponding value of the environment variableOMPI_MCA_btl
or in the default Open MPI MCA configuration file. The MCA parameters can be overridden with--mca
option, for example:Also see the answer that I gave to a similar question.
Yes, it should just work out of the box if there is a TCP/IP connection available (MPI communicates at a random, high TCP port - if TCP is used as transfer layer). Try adding your machine to the hostfile which you supply to
mpirun
. If that doesn't work, you can directly connect to your machine using MPI_Open_port which doesn't requirempirun
.