Process manager in MPI

2019-03-22 04:39发布

I am new in MPI, I have some doubts regarding Job creation and launching. I tried to figure it out but things are quite messy for me. So the cluster architecture on which i am working is like this- There are four nodes(A,B,C,D) connected to each other, MPICH2 is installed on each node. mpiexec -info gives...

.....Configure options: '--prefix=/usr/local/mpich2-1.4.1-install/' '--with-pm=hydra' ....

    Process Manager:                        pmi
    Launchers available:                    ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:           hwloc plpa
    Resource management kernels available:  user slurm ll lsf sge pbs

According to my knowledge(Please correct me if i am wrong) PMI is process management interface, hydra, mpirun, mpiexec are process manager, PMI provides way to interact PM with processes if we are using different PMs. So my doubts are -

1, why it is showing PMI as Process Manager?

2, Is there any role of pbs?

3, Who is responsible for creating the copy of executable on different nodes?(I am launching job from node A).

I know question is very lengthy, I will be thankful for suggestion of some good resources.

1条回答
混吃等死
2楼-- · 2019-03-22 04:52

There are two types of clusters - those who are under the control of some distributed resource manager (DRM) like PBS, LSF, S/OGE, etc. and those who are not. A typical DRM provides mechanisms to launch remote processes within the granted allocation and to control those processes, e.g. send them signals and get back information about their launch and termination statuses. When the cluster is not under the control of a DRM, the MPI runtime has to implement its own process management. Different MPI libraries have different approaches but almost all of them boil down to starting via rsh or ssh a daemon on the remote nodes to take care of the remote processes. Even when a DRM is in use, the library might still put its own process manager in between in order to provide portability.

MPICH comes with two process managers: MPD and Hydra. MPD stands for Multi-Purpose Daemon and is now considered legacy. Hydra is newer and better as it provides topology-aware process binding and other goodies. No matter what process manager is in use, the library has to talk to it somehow, e.g. obtain launch information or request that new processes are launched during MPI_COMM_SPAWN. This is done through the PMI interface.

That being said, the mpiexec in your case is the Hydra process manager. The information that you list are the capabilities of Hydra itself. Since MPICH and its derivatives (e.g. Intel MPI) are probably the only MPI implementations that uses Hydra, the latter doesn't need to provide any other process management interface than the one that is native to MPICH, namely PMI. The launchers are the mechanisms that Hydra could use in order to launch remote processes. ssh and rsh are the obvious choices when no DRM is in use. fork is for starting processes on the local node. Resource management kernels are mechanisms for Hydra to interact with DRMs in order to determine things like granted allocations. Some of those can also launch processes, e.g. pbs uses the tm interface of PBS or Torque.

To summarise:

1) Hydra implements the PMI interface in order to be able to talk to MPICH. It doesn't understand other interfaces, e.g. it cannot launch MPI executables compiled against Open MPI.

2) Hydra integrates with PBS-like DRMs (PBSPro, Torque). The integration means that, for example, you don't have to provide a list of hosts to mpiexec since the list of granted nodes is obtained automatically. It also uses the native tm interface of PBS to launch and monitor remote processes.

3) On a higher level, Hydra launches the remote copies. Ultimately, this is done either by the DRM or via rsh/ssh.

查看更多
登录 后发表回答