Parallel processing in shell scripting, 'pid i

2019-07-27 23:05发布

I have a question about parallel processing in shell scripting. I have a program my Program, which I wish to run multiple times, in a loop within a loop. This program is basically this:

MYPATHDIR=`ls $MYPATH`
for SUBDIRS in $MYPATHDIR; do
  SUBDIR_FILES=`ls $MYPATH/$SUBDIRS`
  for SUBSUBDIRS in $SUBDIR_FILES; do
    find $MYPATH/$SUBDIRS/$SUBSUBDIRS | ./myProgram $MYPATH/$SUBDIRS/outputfile.dat
  done
done

What I wish to do is to take advantage of parallel processing. So I tried this for the middle line to start all the myPrograms at once:

(find $MYPATH/$SUBDIRS/$SUBSUBDIRS | ./myProgram $MYPATH/$SUBDIRS/outputfile.dat &)

However, this began all 300 or so calls to myProgram simultaneously, causing RAM issues etc.

What I would like to do is to run each occurrence of myProgram in the inner loop in parallel, but wait for all of these to finish before moving on to the next outer loop iteration. Based on the answers to this question, I tried the following:

for SUBDIRS in $MYPATHDIR; do
  SUBDIR_FILES=`ls $MYPATH/$SUBDIRS`
  for SUBSUBDIRS in $SUBDIR_FILES; do
    (find $MYPATH/$SUBDIRS/$SUBSUBDIRS | ./myProgram $MYPATH/$SUBDIRS/outputfile.dat &)
  done
  wait $(pgrep myProgram)   
done

But I got the following warning/error, repeated multiple times:

./myScript.sh: line 30: wait: pid 1133 is not a child of this shell

...and all the myPrograms were started at once, as before.

What am I doing wrong? What can I do to achieve my aims? Thanks.

2条回答
我只想做你的唯一
2楼-- · 2019-07-28 00:05

You may find GNU Parallel useful.

parallel -j+0 ./myProgram ::: $MYPATH/$SUBDIRS/*

This will run as many as ./myProgram as CPU cores in parallel.

查看更多
来,给爷笑一个
3楼-- · 2019-07-28 00:08

() invokes a subshell, which then invokes find/myprogram, so you're dealing with "grandchildren" processes. You can't wait on grandchildren, only direct descendants (aka children).

查看更多
登录 后发表回答