I would like to setup a SLURM cluster. How many machines do I need at minimum? Can I start with 2 machines (one being only client, and one being both client and server)?
问题:
回答1:
You can start using only one machine, but 2 machines will be the most standard configuration, being one machine the controller and the other the "worker" node. With this model you can add as many machines to the cluster being "worker" nodes. This way the server will not execute jobs, and will be not suffering jobs interference.
回答2:
As @Carles wrote, you can use only one computer if you want, running both the controller (slurmctld
) and the worker (slurmd
) daemon.
If you want to test some configurations and observe Slurm's behavior, you can even run multiple worker daemon on a single machine to simulate a larger cluster, using the -N <hostname>
option.
If you want to actually get some computation done, you can run the controller and the worker daemon on the same node. If you want the system to still be responsive, just configure Slurm to let it believe the system has 1 core and 2GB of RAM less than it actually has to leave some room for the OS and the Slurm daemons.
As a side note, the pages you link in your question correspond to a very old version of Slurm. The newer version of the documentation is hosted on Schedmd's website.