`srun` drop-in replacement

2019-08-23 05:47发布

问题:

I'm trying to create a function that serves as a drop-in replacement for the SLURM's srun command. The need for this wrapper function is that I want to codify a script using srun when being started under SLURM control, but still being able to run the script without SLURM.

So far, I have this function:

srun_wrap() {
    if [ -z "$SLURM_JOB_ID" ]
    then
        # Not running under SLURM so start the code without srun
        "${@:2}"
    else
        # A SLURM job ID found, so use srun
        srun ${@:1:1} "${@:2}"
    fi
}

This allows me to transform a line like

srun --job-name listing ls

to a line like

srun_wrapper "--job-name listing" ls

Quite close to a drop-in, but not yet.

The rational is:

  • Check variable $SLURM_JOB_ID for some value
    • If there is no value in the variable, it means that we are not under SLURM and we should run the command without srun. The brace expansion ignores the first argument (the srun parameters) and run the rest of the command line.
    • If there is some value, it means that we are in a SLURM allocation, so use srun. The command line is formed with srun, the first parameter unquoted to allow srun to identify properly the parameters and finally, the real command line, properly quoted.

This approach still have two drawbacks:

  1. The srun parameters in the brace expansion have to be unquoted, otherwise they are not properly parsed by srun.
  2. The srun parameters when making the call have to be passed in quotes to be considered as a single parameter.
  3. I'm forced to always pass a parameter to the wrapper, even an empty one. A srun ls will be translated to srun_wrapper "" ls.

Any ideas on how to overcome those three drawbacks? How to quote the brace expansion, how to avoid quoting the srun parameters and how to avoid the need of an empty parameter?

回答1:

This is what I came up with. A shell wizard could probably do better. Does this solve all your drawbacks?

This does have new drawbacks that I'm working on. Flags can't have spaces. So you have to use them like this: srun_wrap --ntasks=2 ls or srun_wrap -n2 ls.

The calls to echo and the function call at the end are just for debugging.

#!/bin/bash

function srun_wrap {
        i=1
        for var in "$@"; do
                # args starting with '-' up to the first that doesn't are srun
                # flags. The rest of the args are the command
                if [[ $var == -* ]]; then
                        flags=( "${flags[@]}" "$var" )
                else
                        cmd="${@:$i}"
                        break
                fi
                i=$((i+1))
        done
        echo "flags = $flags"
        echo "cmd = $cmd"

        if [ "x$SLURM_JOB_ID" != "x" ]; then
                # run in slurm
                eval "srun ${flags[@]} ${cmd[@]}"
        else
                # run outside slurm
                eval "${cmd[@]}"
        fi
}

srun_wrap "$@"

To get around the flags problem, you could define a flag yourself (like --cmd or something) that separates srun flags from the executable.

srun_wrap --ntasks 2 --exclusive --cmd ls -alh

Or you could be more explicit like sbatch and create a --wrap=<command> flag that you parse yourself. You would have to quote it, though.

srun_wrap --ntasks 2 --exclusive --wrap="ls -alh"

The only other option is to create an option parser yourself that can detect valid srun flags. That would be a lot of work, with a potential for bugs.



标签: bash slurm