I have read similar questions about this topic but none of them help me with the following problem:
I have a bash script that looks like this:
#!/bin/bash
for filename in /home/user/Desktop/emak/*.fa; do
mkdir ${filename%.*}
cd ${filename%.*}
mkdir emak
cd ..
done
This script basically does the following:
- Iterate through all files in a directory
- Create a new directory with the name of each file
- Go inside the new file and create a new file called "emak"
The real task does something much computational expensive than create the "emak" file...
I have about thousands of files to iterate through. As each iteration is independent from the previous one, I will like to split it in different processors ( I have 24 cores) so I can do multiples files at the same time.
I read some previous post about running in parallel (using: GNU) but I do not see a clear way to apply it in this case.
thanks
No need for
parallel
; you can simply useThe second line pauses every Nth job to allow all the previous jobs to complete before continuing.
Something like this with GNU Parallel, whereby you create and export a bash function called
doit
:You will really see the benefit of this approach if the time taken by your "computationally expensive" part is longer, or especially variable. If it takes, say up to 10 seconds and is variable, GNU Parallel will submit the next job as soon as the shortest of the N parallel processes completes, rather than waiting for all N to complete before starting the next batch of N jobs.
As a crude benchmark, this takes 58 seconds:
and this is directly comparable and takes 87 seconds: