Could SLURM trigger a script(implemented by the fr

2019-05-15 17:18发布

问题:

As we know SLURM can sent a e-mail when a job is completed.

In addition to that, similar to mailing mechanism when job is completed:

[Q] Could SLURM trigger a script(implemented by the frontend-SLURM user) when any job is completed?

Example solution: This would force me to have while() to check and wait is the submitted job is completed. This might eat additional CPU usage.

jobID=$(sbatch -U user -N1 run.sh | cut -d " " -f4-);
job_state=$(sacct -j $jobID --format=state  | tail -n1 | head -n1)
while [ $job_state != $completed ]
do
    job_state=$(sacct -j $jobID --format=state  | tail -n1 | head -n1)
done
my_script.sh//When any job completed I want SLURM to trigger my_script.sh if possible.

Please that that: I have been told that doing while check each 1 second might be inefficient. Is doing `while ps -p $PID; do sleep 1; ` until a script is completed efficient?

Thank you for your valuable time and help.

回答1:

An option would be to (ab)use the MailProg option in slurm.conf. It is initially meant to be the fully qualified path of a program used to send emails upon job completion to the users. But that program can do anything else. It receives the job ID and some other information through the command line arguments.

So you could configure slurm with MailProg=/path/to/my_script.sh. And you need to make sure the client adds the --mail-type option, or that it is added automatically through a job submit plugin.

The script could have the following structure (untested):

#!/bin/bash

# First to the wanted behaviour
jobid=$(echo $2 | cut -d= -f2 | cut -d' ' -f 1|cut -d_ -f1)
event=$(echo $2 | awk 'print $4')

case $event in
Started)
    job_startup_script $jobid 
    ;; 
Ended|Failed|TIMEOUT)
    job_end_script $jobid
    ;; 
esac

# Then send the email to get the usual behaviour
/bin/mail "$@"

The script will receive from Slurm arguments looking like this:

SLURM Job_id=<Job-ID> Name=<JobName> <Status>, Run time <RunTime>

If the script job_startup_script is very long, start it with nohup and use the ampersand (&) to make it a background process.

Also make sure all the scripts are readable and executable by SlurmUser



标签: slurm