How to run given function in Bash in parallel?

2020-05-14 04:07发布

There have been some similar questions, but my problem is not "run several programs in parallel" - which can be trivially done with parallel or xargs.

I need to parallelize Bash functions.

Let's imagine code like this:

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
    # some processing in here - 20-30 lines of almost pure bash
    done
done

Some of the processing requires calls to external programs.

I'd like to run some (4-10) tasks, each running for different $i. Total number of elements in $list is > 500.

I know I can put the whole for j ... done loop in external script, and just call this program in parallel, but is it possible to do without splitting the functionality between two separate programs?

3条回答
你好瞎i
2楼-- · 2020-05-14 04:32

Edit: Please consider Ole's answer instead.

Instead of a separate script, you can put your code in a separate bash function. You can then export it, and run it via xargs:

#!/bin/bash
dowork() { 
    sleep $((RANDOM % 10 + 1))
    echo "Processing i=$1, j=$2"
}
export -f dowork

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
        printf "%s\0%s\0" "$i" "$j"
    done
done | xargs -0 -n 2 -P 4 bash -c 'dowork "$@"' -- 
查看更多
\"骚年 ilove
3楼-- · 2020-05-14 04:38

sem is part of GNU Parallel and is made for this kind of situation.

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
        # some processing in here - 20-30 lines of almost pure bash
        sem -j 4 dolong task
    done
done

If you like the function better GNU Parallel can do the dual for loop in one go:

dowork() { 
  echo "Starting i=$1, j=$2"
  sleep 5
  echo "Done i=$1, j=$2"
}
export -f dowork

parallel dowork ::: "${list[@]}" ::: "${other[@]}"
查看更多
老娘就宠你
4楼-- · 2020-05-14 04:38

Solution to run multi-line commands in parallel:

for ...your_loop...; do
  test "$(jobs | wc -l)" -ge 8 && wait -n || true  # wait if needed

  {
    any bash commands here
  } &
done
wait

In your case:

for i in "${list[@]}"
do
    for j in "${other[@]}"
    do
        test "$(jobs | wc -l)" -ge 8 && wait -n || true
        {
            your
            multi-line
            commands
            here
        } &
    done
done
wait

If there are 8 bash jobs already running, wait will wait for at least one job to complete. If/when there are less jobs, it starts new ones asynchronously.

Benefits of this approach:

  1. It's very easy for multi-line commands. All your variables are automatically "captured" in scope, no need to pass them around as arguments
  2. It's relatively fast. Compare this, for example, to parallel (I'm quoting official man):

    parallel is slow at starting up - around 250 ms the first time and 150 ms after that.

  3. Only needs bash to work.

Downsides:

  1. There is a possibility that there were 8 jobs when we counted them, but less when we started waiting. (It happens if a jobs finishes in those milliseconds between the two commands.) This can make us wait with fewer jobs than required. However, it will resume when at least one job completes, or immediately if there are 0 jobs running (wait -n exits immediately in this case).
  2. If you already have some commands running asynchronously (&) within the same bash script, you'll have fewer worker processes in the loop.
查看更多
登录 后发表回答