I want to run multiple worker daemons on single machine. As per damienfrancois's answer on what is the minimum number of computers for a slurm cluster it can be done. Problem is currently I am able to execute only 1 worker daemon on one machine. for example
When I run
sudo slurmd -N linux1 -cDvv
sudo slurmd -N linux2 -cDvv
linux1 goes down when I run linux2. Is it possible to run multiple worker daemons on one machine?
Here is my slurm.conf file
as your intention seems to be just testing the behavior of Slurm, I would recommend you to use the front-end mode, where you can create dummy computation nodes in the same machine.
In their FAQ, you have more details, but basically you must configure your installation to work with this mode:
./configure --enable-front-end
And configure the nodes in slurm.conf
NodeName=test[1-100] NodeHostName=localhost
In that guide, they also explain how to launch more than one real daemons in the same node by changing the ports, but for my testing purposes it was not necessary.
Good luck!
I got the same issue as you, I resolved it by modifying the paths of log files as mentioned there multiple slurmd support.
In your slurm.conf for example
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
must be
SlurmdLogFile=/var/log/slurm/slurmd.%n.log
SlurmdPidFile=/var/run/slurmd.%n.pid
SlurmdSpoolDir=/var/spool/slurmd.%n
Now you can launch multiple slurmd.
Note : I tried with your slurm conf, I think some parameters are missing like define two NodeName instead of one and add which Port to use for each of Nodes.
This works for me
# COMPUTE NODES
NodeName=linux[1-10] NodeHostname=linux0 Port=17004 CPUs=1 State=UNKNOWN
NodeName=linux[11-19] NodeHostname=linux0 Port=17005 CPUs=1 State=UNKNOWN
# PARTITIONS
PartitionName=main Nodes=linux1 Default=YES MaxTime=INFINITE State=UP
PartitionName=dev Nodes=linux11 Default=YES MaxTime=INFINITE State=UP