bash script for many files in parallel

I have read similar questions about this topic but none of them help me with the following problem:

I have a bash script that looks like this:

#!/bin/bash

for filename  in /home/user/Desktop/emak/*.fa; do
    mkdir ${filename%.*}
    cd ${filename%.*}
    mkdir emak
    cd ..
done

This script basically does the following:

Iterate through all files in a directory
Create a new directory with the name of each file
Go inside the new file and create a new file called "emak"

The real task does something much computational expensive than create the "emak" file...

I have about thousands of files to iterate through. As each iteration is independent from the previous one, I will like to split it in different processors ( I have 24 cores) so I can do multiples files at the same time.

I read some previous post about running in parallel (using: GNU) but I do not see a clear way to apply it in this case.

thanks

标签： bash parallel-processing

2条回答

劫难

2楼-- · 2019-07-06 06:03

No need for parallel; you can simply use

N=10
for filename in /home/user/Desktop/emak/*.fa; do
    mkdir -p "${filename%.*}/emak" &
    (( ++count % N == 0)) && wait
done

The second line pauses every Nth job to allow all the previous jobs to complete before continuing.

0人赞添加讨论(0) 举报

成全新的幸福

3楼-- · 2019-07-06 06:09

Something like this with GNU Parallel, whereby you create and export a bash function called doit:

#!/bin/bash

doit() {
    dir=${1%.*}
    mkdir "$dir"
    cd "$dir"
    mkdir emak
}
export -f doit
parallel doit ::: /home/user/Desktop/emak/*.fa

You will really see the benefit of this approach if the time taken by your "computationally expensive" part is longer, or especially variable. If it takes, say up to 10 seconds and is variable, GNU Parallel will submit the next job as soon as the shortest of the N parallel processes completes, rather than waiting for all N to complete before starting the next batch of N jobs.

As a crude benchmark, this takes 58 seconds:

#!/bin/bash

doit() {
   echo $1
   # Sleep up to 10 seconds
   sleep $((RANDOM*11/32768))
}
export -f doit
parallel -j 10 doit ::: {0..99}

and this is directly comparable and takes 87 seconds:

#!/bin/bash
N=10
for i in {0..99}; do
    echo $i
    sleep $((RANDOM*11/32768)) &
    (( ++count % N == 0)) && wait
done

0人赞添加讨论(0) 举报

bash script for many files in parallel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间