Below are three different sbatch
scripts that produce roughly similar results.
(I show only the parts where the scripts differ; the ##
prefix indicates the output obtained by submitting the scripts to sbatch
.)
Script 0
#SBATCH -n 4
srun -l hostname -s
## ==> slurm-7613732.out <==
## 0: node-73
## 1: node-73
## 2: node-73
## 3: node-73
Script 1
#SBATCH -n 1
#SBATCH -a 1-4
srun hostname -s
## ==> slurm-7613733_1.out <==
## node-72
##
## ==> slurm-7613733_2.out <==
## node-73
##
## ==> slurm-7613733_3.out <==
## node-72
##
## ==> slurm-7613733_4.out <==
## node-73
Script 2
#SBATCH -N 4
srun -l -n 4 hostname -s
## ==> slurm-7613738.out <==
## 0: node-74
## 2: node-76
## 1: node-75
## 3: node-77
Q: Why would one choose one such approach over the others?
(I see that the jobs spawned by Script 0 all ran on the same node, but I can't tell if this is a coincidence.)
Also, the following variant of Script 2 (the only difference being -N 2
instead of -N 4
) fails:
Script 3
#SBATCH -N 2
srun -l -n 4 hostname -s
## ==> slurm-7614825.out <==
## srun: error: Unable to create job step: More processors requested than permitted
Ditto for the following variant of Script 2 (the only difference between this and Script 3 is that here srun
also has the flag -c 2
):
Script 4
#SBATCH -N 2
srun -l -n 4 -c 2 hostname -s
## ==> slurm-7614827.out <==
## srun: error: Unable to create job step: More processors requested than permitted
Qs: are the errors I get with Script 3 and Script 4 due to wrong syntax, wrong semantics, or site-specific configs? IOW, is there something inherently wrong with these scripts (that would cause them to fail under any instance of SLURM), or are the errors only due to violations of restrictions imposed by the particular instance of SLURM I'm submitting the jobs to? If the latter is the case, how can I pinpoint the configs responsible for the error?