可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a batch script which starts off a couple of qsub jobs, and I want to trap when they are all completed.
I don't want to use the -sync option, because I want them to be running simultaneously. Each job has a different set of command line parameters.
I want my script to wait till when all the jobs have been completed, and do something after that. I don't want to use the sleep function e.g. to check if certain files have been generated after each 30 s, because this is a drain on resources.
I believe Torque may have some options, but I am running SGE.
Any ideas on how I could implement this please?
Thanks
P.s.
I did find another thread
Link
which had a reponse
You can use wait to stop execution until all your jobs are done. You can even collect all the exit statuses and other running statistics (time it took, count of jobs done at the time, whatever) if you cycle around waiting for specific ids.
but I am not sure how to use it without polling on some value. Can bash trap be used, but how would I with qsub?
回答1:
Launch your qsub jobs, using the -N option to give them arbitrary names (job1, job2, etc):
qsub -N job1 -cwd ./job1_script
qsub -N job2 -cwd ./job2_script
qsub -N job3 -cwd ./job3_script
Launch your script and tell it to wait until the jobs named job1, job2 and job3 are finished before it starts:
qsub -hold_jid job1,job2,job3 -cwd ./results_script
回答2:
qsub -hold_jid job1,job2,job3 -cwd ./myscript
回答3:
Another alternative (from here) is as follows:
FIRST=$(qsub job1.pbs)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
echo $THIRD
The insight is that qsub returns the jobid and this is typically dumped to standard output. Instead, capture it in a variable ($FIRST
, $SECOND
, $THIRD
) and use the -W depend=afterany:[JOBIDs]
flag when you enqueue your jobs to control the dependency structure of when they are dequeued.
回答4:
This works in bash, but the ideas should be portable. Use -terse
to facilitate building up a string with job ids to wait on; then submit a dummy job that uses -hold_jid
to wait on the previous jobs and -sync y
so that qsub doesn't return until it (and thus all prereqs) has finished:
# example where each of three jobs just sleeps for some time:
job_ids=$(qsub -terse -b y sleep 10)
job_ids=job_ids,$(qsub -terse -b y sleep 20)
job_ids=job_ids,$(qsub -terse -b y sleep 30)
qsub -hold_jid ${job_ids} -sync y -b y echo "DONE"
-terse
option makes the output of qsub just be the job id
-hold_jid
option (as mentioned in other answers) makes a job wait on specified job ids
-sync y
option (referenced by the OP) asks qsub not to return until the submitted job is finished
-b y
specifies that the command is not a path to a script file (for instance, I'm using sleep 30
as the command)
See the man page for more details.
回答5:
If all the jobs have a common pattern in the name, you can provide that pattern when you submit the jobs. https://linux.die.net/man/1/sge_types shows you what patterns you can use. example:
-hold_jid "job_name_pattern*"
回答6:
In case you have 150 files that you want process and be able to run only 15 each time, while the other are in holding in the queue you can set something like this.
# split my list files in a junk of small list having 10 file each
awk 'NR%10==1 {x="F"++i;}{ print > "list_part"x".txt" }' list.txt
qsub all the jobs in such a way that the first of each list_part*.txt hold the second one ....the second one hold the third one .....and so on.
for list in $( ls list_part*.txt ) ; do
PREV_JOB=$(qsub start.sh) # create a dummy script start.sh just for starting
for file in $(cat $list ) ; do
NEXT_JOB=$(qsub -v file=$file -W depend=afterany:$PREV_JOB myscript.sh )
PREV_JOB=$NEXT_JOB
done
done
This is useful if you have in myscript.sh a procedure that require move or download many files or create intense traffic in the cluster-lan
回答7:
I needed more flexibility, so I built a Python module for this and other purposes here. You can run the module directly as a script (python qsub.py
) for a demo.
Usage:
$ git clone https://github.com/stevekm/util.git
$ cd util
$ python
Python 2.7.3 (default, Mar 29 2013, 16:50:34)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import qsub
>>> job = qsub.submit(command = 'echo foo; sleep 60', print_verbose = True)
qsub command is:
qsub -j y -N "python" -o :"/home/util/" -e :"/home/util/" <<E0F
set -x
echo foo; sleep 60
set +x
E0F
>>> qsub.monitor_jobs(jobs = [job], print_verbose = True)
Monitoring jobs for completion. Number of jobs in queue: 1
Number of jobs in queue: 0
No jobs remaining in the job queue
([Job(id = 4112505, name = python, log_dir = None)], [])
Designed with Python 2.7 and SGE since thats what our system runs. The only non-standard Python libraries required are the included tools.py
and log.py
modules, and sh.py (also included)
Obviously not as helpful if you wish to stay purely in bash
, but if you need to wait on qsub
jobs then I would imagine your workflow is edging towards a complexity that would benefit from using Python instead.