I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?
Thanks in advance!
I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?
Thanks in advance!
If you are not the administrator, your can hold
some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>
, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD
. Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25
to have 100 jobs in the array but only 25 of them running.
According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs
parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.
You should be able to do something similar to:
sacctmgr modify user <userid> account=<account_name> set MaxJobs=10
I found this presentation to be very helpful in case you have more questions.
According to SLURM documentation, --array=0-15%4
(- sign and not :) will limit the number of simultaneously running tasks from this job array to 4
I wrote test.sbatch:
#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output
mkdir test${SLURM_ARRAY_TASK_ID}
# sleep for up to 10 minutes to see them running in squeue and
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"
sleep $number
and run it with sbatch --array=1-15%4 test.sbatch
Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.
Appreciate comments and suggestions.
If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm
#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25 # Submit 419 tasks with with only 25 of them running at any time
#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt
cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}') # Get first argument
$cmd_line #may need to be piped to bash