Slurm oversubscribe GPUs

2019-07-26 09:04发布

问题:

Is there a way to oversubscribe GPUs on Slurm, i.e. run multiple jobs/job steps that share one GPU? We've only found ways to oversubscribe CPUs and memory, but not GPUs.

We want to run multiple job steps on the same GPU in parallel and optionally specify the GPU memory used for each step.

回答1:

The easiest way of doing that is to have the GPU defined as a feature rather than as a gres so Slurm will not manage the GPUs, just make sure that job that need one land on nodes that offer one.



标签: gpu slurm