bash script for many files in parallel

2019-07-06 05:41发布

I have read similar questions about this topic but none of them help me with the following problem:

I have a bash script that looks like this:

#!/bin/bash

for filename  in /home/user/Desktop/emak/*.fa; do
    mkdir ${filename%.*}
    cd ${filename%.*}
    mkdir emak
    cd ..
done

This script basically does the following:

  • Iterate through all files in a directory
  • Create a new directory with the name of each file
  • Go inside the new file and create a new file called "emak"

The real task does something much computational expensive than create the "emak" file...

I have about thousands of files to iterate through. As each iteration is independent from the previous one, I will like to split it in different processors ( I have 24 cores) so I can do multiples files at the same time.

I read some previous post about running in parallel (using: GNU) but I do not see a clear way to apply it in this case.

thanks

2条回答
劫难
2楼-- · 2019-07-06 06:03

No need for parallel; you can simply use

N=10
for filename in /home/user/Desktop/emak/*.fa; do
    mkdir -p "${filename%.*}/emak" &
    (( ++count % N == 0)) && wait
done

The second line pauses every Nth job to allow all the previous jobs to complete before continuing.

查看更多
成全新的幸福
3楼-- · 2019-07-06 06:09

Something like this with GNU Parallel, whereby you create and export a bash function called doit:

#!/bin/bash

doit() {
    dir=${1%.*}
    mkdir "$dir"
    cd "$dir"
    mkdir emak
}
export -f doit
parallel doit ::: /home/user/Desktop/emak/*.fa

You will really see the benefit of this approach if the time taken by your "computationally expensive" part is longer, or especially variable. If it takes, say up to 10 seconds and is variable, GNU Parallel will submit the next job as soon as the shortest of the N parallel processes completes, rather than waiting for all N to complete before starting the next batch of N jobs.

As a crude benchmark, this takes 58 seconds:

#!/bin/bash

doit() {
   echo $1
   # Sleep up to 10 seconds
   sleep $((RANDOM*11/32768))
}
export -f doit
parallel -j 10 doit ::: {0..99}

and this is directly comparable and takes 87 seconds:

#!/bin/bash
N=10
for i in {0..99}; do
    echo $i
    sleep $((RANDOM*11/32768)) &
    (( ++count % N == 0)) && wait
done
查看更多
登录 后发表回答