How do I assign 2 MPI processes per core?
For example, if I do mpirun -np 4 ./application
then it should use 2 physical cores to run 4 MPI processes (2 processes per core). I am using Open MPI 1.6. I did mpirun -np 4 -nc 2 ./application
but wasn't able to run it.
It complains mpirun was unable to launch the specified application as it could not find an executable:
If you use PBS, or something like that, i would suggest this kind of submission:
qsub -l select=128:ncpus=40:mpiprocs=16 -v NPROC=2048./pbs_script.csh
In the present submission i select 128 computational nodes, that have 40 cores, and use 16 of them. In my case, i have 20 physical cores per node.
In this submission i block all the 40 cores of the node and nobody can use these resources. it can avoid other peoples from using the same node and competing with your job.
I'm not sure if you have multiple machines or not, and the exact details of how you want the processes distributed, but I'd consider reading up:
mpirun man page
The manual indicates that it has ways of binding processes to different things, including nodes, sockets, and cpu cores.
It's important to note that you will achieve this if you simply run twice as many processes as you have CPU cores, since they will tend to evenly distribute over cores to share load.
I'd try something like the following, though the manual is somewhat ambiguous and I'm not 100% sure it will behave as intended, as long as you have a dual core:
orterun
(the Open MPI SPMD/MPMD launcher;mpirun/mpiexec
are just symlinks to it) has some support for process binding but it is not flexible enough to allow you to bind two processes per core. You can try with-bycore -bind-to-core
but it will err when all cores already have one process assigned to them.But there is a workaround - you can use a rankfile where you explicitly specify which slot to bind each rank to. Here is an example: in order to run 4 processes on a dual-core CPU with 2 processes per core, you would do the following:
where
rankfile
is a text file with the following content:This will place ranks 0 and 1 on core 0 of processor 0 and ranks 2 and 3 on core 1 of processor 0. Ugly but works:
Edit: From your other question is becomes clear that you are actually running on a hyperthreaded CPU. Then you would have to figure out the physical numbering of your logical processors (it's a bit confusing but physical numbering corresponds to the value of
processor:
as reported in/proc/cpuinfo
). The easiest way to obtain it is to install thehwloc
library. It provides thehwloc-ls
tool that you can use like this:Physical IDs are listed after
P#
in the brackets. In your 8-core case the second hyperthread of the first core (core 0) would most likely have ID8
and hence your rankfile would look something like:(note the
p
prefix - don't omit it)If you don't have
hwloc
or you cannot install it, then you would have to parse/proc/cpuinfo
on your own. Hyperthreads would have the same values ofphysical id
andcore id
but differentprocessor
andapicid
. The physical ID is equal to the value ofprocessor
.