I am working with MPI, and I have a certain hierarchy of operations. For a particular value of a parameter _param
, I launch 10 trials, each running a specific process on a distinct core. For n values of _param
, the code runs in a certain hierarchy as:
driver_file ->
launches one process which checks if available processes are more than 10. If more than 10 are available, then it launches an instance of a process with a specific _param
value passed as an argument to coupling_file
coupling_file ->
does some elementary computation, and then launches 10 processes using MPI_Comm_spawn()
, each corresponding to a trial_file while passing _trial
as an argument
trial_file -> computes work, returns values to the coupling_file
I am facing two dilemmas, namely:
How do I evaluate the required condition for the cores in driver_file? As in, how do I find out how many processes have been terminated, so that I can correctly schedule processes on idle cores? I thought maybe adding a blocking
MPI_Recv()
and use it to pass a variable which would tell me when a certain process has been finished, but I'm not sure if this is the best solution.How do I ensure that processes are assigned to different cores? I had thought about using something like
mpiexec --bind-to-core --bycore -n 1 coupling_file
to launch one coupling_file. This will be followed by something likempiexec --bind-to-core --bycore -n 10 trial_file
launched by the coupling_file. However, if I am binding processes to a core, I don't want the same core to have two/more processes. As in, I don't want_trial_1
of_coupling_1
to run on corex
, then I launch another process ofcoupling_2
which launches_trial_2
which also gets bound to corex
.
Any input would be appreciated. Thanks!
If it is an option for you, I'd drop the spawning processes thing altogether, and instead start all processes at once. You can then easily partition them into chunks working on a single task. A translation of your concept could for example be:
In your code you then can do something like:
I think, following this approach would allow you to solve both your issues. Availability of process groups gets detected by MPI_Wait*, though you might want to change the logic above, to notify the master at the end of your task so it only sends new data then, not already during the previous trial is still running, and another process group might be faster. And pinning is resolved as you have a fixed number of processes, which can be properly pinned during the usual startup.