Run a script on multi cores and parallel processin

2019-09-05 08:34发布

问题:

I am writing a script which take a range of parameters from command line

script.pl start end 

for ($k1=$start; $k1<$end; $k1 += 0.001) {
  for ($k2=$start; $k2<$end; $k2 += 0.01) {
    for ($k3=$start; $k3<$end; $k3 += 0.001) {
      for ($k4=$start; $k4<$end; $k4 += 0.001) {
        for ($k5=$start; $k5<$end; $k5 += 0.001) {
...

}}}}}

if I set the parameters between 0 to 1, it takes a long time. The simplest way is to split them into smaller intervals like

script.pl 0 0.01 
script.pl 0.01 0.02
...
script.pl 0.9 1

Then I have to open 100 screen at the same time!!

Can somebody guide me how I can do it automatically?

I was not sure what would be the best way, for this reason I asked. I have 256 cores.

回答1:

The really critical question when looking at parallel code is dependencies. I'm going to assume that - because your script can be subdivided - you're not doing anything complicated inside the loop.

But because you're stepping by 0.001 and 5 loops deep you're just doing a LOT of iterations if you were to go from 0 to 1. 100,000,000,000,000 of them, to be precise.

To parallelise, I would personally suggest you 'unroll' the outer loop and use Parallel::ForkManager.

E.g.

my $CPU_count = 256;

my $fork_manager = Parallel::ForkManager->new($CPU_count);

for ( my $k1 = $start; $k1 < $end; $k1 += 0.001 ) {
    # Run outer loop in parallel
    my $pid = $fork_manager->start and next;

    for ( my $k2 = $start; $k2 < $end; $k2 += 0.01 ) {
        for ( my $k3 = $start; $k3 < $end; $k3 += 0.001 ) {
            for ( my $k4 = $start; $k4 < $end; $k4 += 0.001 ) {
                for ( my $k5 = $start; $k5 < $end; $k5 += 0.001 ) {
                    ...;
                }
            }
        }
    }

    $fork_manager->end;
}

What this will do is - for each iteration of that 'outer' loop, fork your process and run the 4 inner loops as a separate process. It'll cap at 256 concurrent processes. You should match this to the number of CPUs you have available.

Bear in mind though - this only really works for trivial 'cpu intensive' tasks. If you're doing much disk IO or trying to share memory this won't work nearly as well.

Also note - if the number of steps on the outer loop is fewer than the number of CPUs it won't parallelise quite so well.

I'd also note - $k2 has a smaller iterator. I've copied that from your source, but it may be a typo.



回答2:

I'm not sure what you mean but this will launch 100 jobs in the background in parallel. Note that it can bring your computer to its knees, depending on your hardware:

$ seq 0 0.02 0.99 | perl -lne 'print "$_ ",$_+0.01' | 
    while read start end; do script.pl $start $end; done; script.pl 0.99 1

The idea is to use seq to generate the intervals, piped through a little perl script that prints out the pairs. These are then read by the bash loop and the script is launched with the relevant parameters.

Note, however, that this is far from an elegant way of achieving your goals. You might want to look into GNU Parallel or the various paralelization tools available for Perl itself.



回答3:

Variant of terdon's answer:

paste <(seq -w 0 .01 1) <(seq -w 0.01 0.01 1.01) | xargs -n2 -P 255 ./script.pl

will start 255 paralell processes in the next form

./script.pl 0.00 0.01
./script.pl 0.01 0.02
...
...
./script.pl 0.98 0.99
./script.pl 0.99 1.00


标签: linux perl shell