According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.
You should be able to do something similar to:
sacctmgr modify user <userid> account=<account_name> set MaxJobs=10
I found this presentation to be very helpful in case you have more questions.
#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25 # Submit 419 tasks with with only 25 of them running at any time
#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt
cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}') # Get first argument
$cmd_line #may need to be piped to bash
According to SLURM documentation, --array=0-15%4 (- sign and not :) will limit the number of simultaneously running tasks from this job array to 4
I wrote test.sbatch:
#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output
mkdir test${SLURM_ARRAY_TASK_ID}
# sleep for up to 10 minutes to see them running in squeue and
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"
sleep $number
and run it with sbatch --array=1-15%4 test.sbatch
Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.
If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD. Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs in the array but only 25 of them running.
According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the
MaxJobs
parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.You should be able to do something similar to:
I found this presentation to be very helpful in case you have more questions.
If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm
According to SLURM documentation,
--array=0-15%4
(- sign and not :) will limit the number of simultaneously running tasks from this job array to 4I wrote test.sbatch:
and run it with
sbatch --array=1-15%4 test.sbatch
Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.
Appreciate comments and suggestions.
If you are not the administrator, your can
hold
some jobs if you do not want them all to start at the same time, withscontrol hold <JOBID>
, and you can delay the submission of some jobs withsbatch --begin=YYYY-MM-DD
. Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance--array=1:100%25
to have 100 jobs in the array but only 25 of them running.