Is there a way in bash/slurm for the script to know which node it is running on?
so I sbatch a bash script called wrapCode.sh, and I am monitoring script time as well as which node it is running on. I know how to monitor the script time, but is there a way to echo out at the end which node I was on?
sstat does this, but I need to know what my job id is, which the script also doesn't seem to know (or at least I haven't been able to find it).
A simple, yet effective, and often used, way to write in the job output on which node it ran is to add
srun hostname
to it. Also the job id is available from within the job script through environment variable SLURM_JOB_ID ; so you can use
sstat -j $SLURM_JOB_ID
in your slurm script to get the information you want.
When you submit a job to the grid, you always get a message that tells you the JobID. If you do this interactively, you will see something like this:
$ sbatch wrapCode.sh
Submitted batch job 106
Therefore, you can write a simple wrapper bash script to do the job submission and get the JobID for you. After that, you can use the scontrol
command to get detailed information about the job (including the node) as see below:
#!/bin/bash
Command="sbatch wrapCode.sh"
Submit_Output="$($Command 2>&1)"
JobId=`echo $Submit_Output | grep 'Submitted batch job' | awk '{print $4}'`
echo $JobId
# --> Sleep here for a few seconds to wait until the job is actually launched
Host=`scontrol show job $JobId | grep ' NodeList' | awk -F'=' '{print $2}'`
echo $Host
The jobid for your job can be found in the environment variable SLURM_JOBID.
This variable is automatically set by SLURM upon submission of your job.
As for finding the name of the node running your job, this can be found in the environment variable SLURMD_NODENAME.
The variable SLURM_NODELIST will give you a list of nodes allocated to a job (unless you run a job across multiple nodes, this will only contain one name).
There are lot of variables that contain information on your job, see https://slurm.schedmd.com/sbatch.html#lbAH