I write an OpenMPI application which consists of a server and a client part which are launched separately:
me@server1:~> mpirun server
and
me@server2:~> mpirun client
server
creates a port using MPI_Open_port
. The question is: Does OpenMPI have a mechanism to communicate the port to client
? I suppose that MPI_Publish_name
and MPI_Lookup_name
doesn't work here because server
wouldn't know to which other computer the information should be sent.
To me, it looks like only processes which were started using a single mpirun
can communicate with MPI_Publish_name
.
I also found ompi-server, but the documentation is too minimalistic for me to understand this. Does anyone know how this is used?
Related: MPICH: How to publish_name such that a client application can lookup_name it? and https://stackoverflow.com/questions/9263458/client-server-example-using-ompi-does-not-work
MPI_Publish_name
is supplied with an MPI info object, which could have an Open MPI specific boolean keyompi_global_scope
. If this key is set to true, then the name would be published to the global scope, i.e. to an already running instance ofompi-server
.MPI_Lookup_name
by default first does a global name lookup if the URI of theompi-server
was provided.With a dedicated Open MPI server
The process involves several steps:
1) Start the
ompi-server
somewhere in the cluster where it could be accessed from all nodes. For debugging purposes you may pass it the--no-daemonize -r +
argument. It would start and print to the standard output an URI similar to this one:2) In the server, build an MPI info object and set the
ompi_global_scope
key to true:Then pass the info object to
MPI_Publish_name
:3) In the client, the call to
MPI_Lookup_name
would automatically do the lookup in the global context first (this could be changed by providing the proper key in the MPI info object, but in your case the default behaviour should suffice).In order for both client and server code to know where the
ompi-server
is located, you have to give its URI to bothmpirun
commands with the--ompi-server 1221656576.0;tcp://10.1.13.164:36351;tcp://192.168.221.41:36351
option.Another option is to have
ompi-server
write the URI to a file, which can then be read on the node(s) wherempirun
is to be run. For example, if you start the server on the same node where bothmpirun
commands are executed, then you could use a file in/tmp
. If you start theompi-server
on a different node, then a shared file system (NFS, Lustre, etc.) would do. Either way, the set of commands would be:Serverless method
If run both
mpirun
's on the same node, the--ompi-server
could also specify the PID of an already runningmpirun
instance to be used as a name server. It allows you to use local name publishing in the server (i.e. skip the "run an ompi-server" and "make an info object" parts). The sequence of commands would be:where
12345
should be replaced by the real PID of the server'smpirun
.You can also have the server's
mpirun
print its URI and pass that URI to the client'smpirun
:You could also have the URI written to a file if you specify
/path/to/file
(note: nofile:
prefix here) instead of+
after the--report-uri
option:Note that the URI returned by
mpirun
has the same format as that of anompi-server
, i.e. it includes the host IP address, so it also works if the secondmpirun
is executed on a different node, which is able to talk to the first node via TCP/IP (and/path/to/urifile
lives on a shared file system).I tested all of the above with Open MPI 1.6.1. Some of the variant might not work with earlier versions.