Limit the number of running jobs in SLURM

2020-02-11 03:30发布

问题:

I am queuing multiple jobs in SLURM. Can I limit the number of parallel running jobs in slurm?

Thanks in advance!

回答1:

If you are not the administrator, your can hold some jobs if you do not want them all to start at the same time, with scontrol hold <JOBID>, and you can delay the submission of some jobs with sbatch --begin=YYYY-MM-DD. Also, if it is a job array, you can limit the number of jobs in the array that are concurrently running with for instance --array=1:100%25 to have 100 jobs in the array but only 25 of them running.



回答2:

According to the SLURM Resource Limits documentation, you can limit the total number of jobs that you can run for an association/qos with the MaxJobs parameter. As a reminder, an association is a combination of cluster, account, user name and (optional) partition name.

You should be able to do something similar to:

sacctmgr modify user <userid> account=<account_name> set MaxJobs=10

I found this presentation to be very helpful in case you have more questions.



回答3:

According to SLURM documentation, --array=0-15%4 (- sign and not :) will limit the number of simultaneously running tasks from this job array to 4

I wrote test.sbatch:

#!/bin/bash
# test.sbatch
#
#SBATCH -J a
#SBATCH -p campus
#SBATCH -c 1
#SBATCH -o %A_%a.output

mkdir test${SLURM_ARRAY_TASK_ID}

# sleep for up to 10 minutes to see them running in squeue and 
# different times to check that the number of parallel jobs remain constant
RANGE=600; number=$RANDOM; let "number %= $RANGE"; echo "$number"

sleep $number

and run it with sbatch --array=1-15%4 test.sbatch

Jobs run as expected (always 4 in parallel) and just create directories and kept running for $number seconds.

Appreciate comments and suggestions.



回答4:

If your jobs are relatively similar you can use the slurm array functions. I had been trying to figure this out for a while and found this solution at https://docs.id.unibe.ch/ubelix/job-management-with-slurm/array-jobs-with-slurm

#!/bin/bash -x
#SBATCH --mail-type=NONE
#SBATCH --array=1-419%25  # Submit 419 tasks with with only 25 of them running at any time

#contains the list of 419 commands I want to run
cmd_file=s1List_170519.txt

cmd_line=$(cat $cmd_file | awk -v var=${SLURM_ARRAY_TASK_ID} 'NR==var {print $1}')    # Get first argument

$cmd_line  #may need to be piped to bash


标签: slurm