SLURM job taking up entire node when using just on

2020-03-05 03:49发布

站内文章 / 前端开发

99 0

爷、活的狠高调

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am submitting multiple jobs to a SLURM queue. Each job uses 1 GPU. We have 4 GPUs per node. However once a job is running, it takes up the entire node, leaving 3 GPUs idle. Is there any way to avoid this, so that I can send multiple jobs to one node, using one GPU each?

My script looks like this:

#SLURM --gres=gpu:1
#SLURM --ntasks-per-node 1
#SLURM -p ghp-queue
myprog.exe

回答1:

I was also unable to run multiple jobs on different GPUs. What helped was adding OverSubscribe=FORCE to the partition configuration in slurm.conf, like this:

PartitionName=compute Nodes=ALL ... OverSubscribe=FORCE

After that, I was able to run four jobs with --gres=gpu:1, and each one took a different GPU (a fifth job is queued, as expected).

标签： slurm

爷、活的狠高调

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

SLURM job taking up entire node when using just on

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮