So I have a sbatch (slurm job scheduler) script in which I am processing a lot of data through 3 scripts: foo1.sh, foo2.sh and foo3.sh.
foo1.sh and foo2.sh are independent and I want to run them simultaneously. foo3.sh needs the outputs of foo1.sh and foo2.sh so I am building a dependency. And then I have to repeat it 30 times.
Let say:
## Resources config
#SBATCH --ntasks=30
#SBATCH --task-per-core=1
for i in {1..30};
do
srun -n 1 --jobid=foo1_$i ./foo1.sh &
srun -n 1 --jobid=foo2_$i ./foo2.sh &
srun -n 1 --jobid=foo3_$i --dependency=afterok:foo1_$1:foo2_$i ./foo3.sh &
done;
wait
The idea being that you launch foo1_1 and foo2_1 but since foo3_1 have to wait for the two other jobs to finish, I want to go to the next iteration. The next iteration is going to launch foo1_2 foo2_2 and foo3_2 will wait etc.
At some point, then, the number of subjobs launched with srun will be higher than --ntasks=30. What is going to happen? Will it wait for a previous job to finish (behavior I am looking for)?
Thanks
What should happen is, if you kick-off more subtasks than you have cores or hyperthreads, then the OS scheduling algorithms should handle prioritizing the tasks. Depending on which OS you are running (even if they are all Unix based), the way this is implemented under the hood will be different.
But you are correct in your assumption that if you run out of cores, then your parallel tasks must, in a sense, 'wait their turn'.
Slurm will run 30
srun
's but the 31st will wait that a core get freed within your 30-cores allocation. note that the proper argument is--ntasks-per-core=1
, and not--tasks-per-core=1
You can test it by yourself using salloc rather than sbatch to work interactively:
You see that the simple
echo
took 10 seconds because the thirdsrun
had to wait until the first two have finished as the allocation is two cores only.