excluding nodes from qsub command under sge

2019-04-04 13:42发布

问题:

I have more than 200 jobs I need to submit to and sge cluster. I'll be submitting them into two ques. One of the ques have a machine that I don't want to submit jobs to. How can I exclude that machine? The only thing I found that might be helpful is (assuming three valid nodes available to q1 and all the available nodes for q2 are valid):

qsub -q q1.q@n1 q1.q@n2 q1.q@n3 q2.q

回答1:

Assuming you don't want to run it on is called n4 then adding the following to your script should work.

#$ -l h=!n4


回答2:

The best way I've found for this is to set up a custom resource on the nodes that you want to allow the execution on, then require that resource when you submit the job.

In qmon, go to the "complex" configuration and add a new attribute. Set the name to something like "my_allowed" and the shortcut to something like "m_a", the type to BOOL, the relation to ==, requestable to Yes, consumable to No, and "Add" it. Commit your changes to the complex configurations.

The next step is probably easier to do from the command line, but you can do it in qmon, as well. You need to add your consumable to each host that you're going to allow your job to run on. In qmon, you can go to the host configuration, select execution host, and open each host in turn, clicking on the consumables/fixed attributes tab and adding the new complex that you just configured above with "True" as the value. From the command line, you can get a list of your execution hosts with "qconf -sel". This list is suitable for passing to a loop and grepping out the host(s) you don't want included. Do something like this:

qconf -sel | grep -v host_to_exclude | while read host; do
    EDITOR="ed" qconf -me $h <<EOL
/complex_values/s/$/,my_test=True/
w
q
EOL
done

This lets you programmatically edit the host (not normally allowed by qconf as it wants to start up your editor for you). It does this by setting the editor to "ed" (you'll have to make sure you have the ed editor installed... try running it by hand first... type "q" to get out). ed takes the list of editing commands on it's stdin, so we give it three commands. The first edits the line with the complex_values on it to include the my_test value. The second writes out the temporary file and the third quits ed.

Once you've done this, submit your jobs with a limit option that requires your new complex:

qsub -q whatever -l my_test=True my_prog.sh

The -l option sets a limit and the my_test=True says the job can only run on hosts that have the complex my_test with a value of True. Since the complex isn't consumable, it can still run as many jobs on each host as it wants to (up to the slot limit for the hosts), but it will avoid any hosts that don't have the my_test complex set to True.



回答3:

There is a nice bypass to this.

Generate a simple bash file:

#!/bin/bash
sleep 6000 #replace 6000 with any long period of time that will be enough to submit your jobs

submit this jobs to the node you wish to exclude until they fully occupy it.

Wuala, your node is exclude.