use Parallel::ForkManager;
my $number_running = 0;
my $pm = new Parallel::ForkManager(30);
$pm->run_on_start( sub { ++$number_running; } );
$pm->run_on_finish( sub { --$number_running; } );
for (my $i=0; $i<=100; $i++)
{
if ($number_running == 5) { while ($number_running > 0) {} } # waits forever
$pm->start and next;
print $i;
$pm->finish;
}
The above code uses Parallel::ForkManager to execute code in a for loop using parallel processes. It is counting how many child processes are running and setting the $number_running
variable accordingly. Once 5 child processes are running, I would like it to wait until 0 child processes are running before continuing.
The first line in the for loop is designed to achieve this but it waits forever on that line. It's like the change to the variable made by the child processes is not available to that line of code. What am I doing wrong? Note: I am aware of wait_all_children
but I don't want to use it.
Short The callback
run_on_finish
normally doesn't get triggered for every child's exit, so$number_running
doesn't get reduced and thus it can't control the loop. Ways to fix this:use
reap_finished_children
in order to communicate as individual children exit, so thatrun_on_finish
indeed gets to run as each child exitsuse
wait_for_available_procs
to wait for the whole batch to finish before starting a new oneThe callback
run_on_start
runs with every new process and the counter is incremented. But the callbackrun_on_finish
is never triggered so the counter is never decremented. Thus once it reaches5
the code sits in thewhile
loop. Note that the parent and children cannot directly change each other's variables, being separate processes.The callback
run_on_finish
is commonly triggered by havingwait_all_children
after all processes were forked. Its job is also done when maximum number of processes run and one exits. This is done instart
by a call towait_one_child
(which callson_finish
, see below).Or, this can be done at will by calling reap_finished_children method
This resolves the main concern of how to communicate as individual children exit (as clarified in comments), and not by
wait_all_children
.Here is an example of how to use it so that the callback runs right as a child exits. A good deal of the code is merely for diagnostics (prints).
Use of this method is equivalent to calling
waitpid -1, POSIX::WNOHANG
in a loop afterfork
. This forks fewer than the max (30
) processes to see output more easily and demonstrate that the callback runs right as a child exits. Change these numbers to see its full operation.We exit with
10*$i
, so to track children in output. The data returned in an anonymous array[...]
is a descriptive string. As soon asreap_finished_children
completes the$number_running
is reduced, in the callback. This is why we have the$curr
variable, again for diagnostics.This prints
The direct question is of how to wait for the whole batch to finish before starting a new one. This can be done directly by wait_for_available_procs($n)
If
$MAX
is used for$n
, that many slots will become available only once the whole batch completed. What to use for$n
can also be decided at runtime.Some details of module's operation
When a child exits the
SIGCHLD
signal is sent to the parent, which it must catch in order to know that the child is gone (and to avoid zombies, in the first place). This is done by usingwait
orwaitpid
, in code or in theSIGCHLD
handler (but only at one place). See fork, Signals in perlipc, waitpid and wait.We see from P::FM's source that this is done in
wait_one_child
(via_waitpid
sub)which is used in
wait_all_children
The method
reap_finished_children
used above is a synonym for this method.The method
wait_one_child
that gets the signal is used bystart
to reap child processes when maximum number of processes is filled and one exits. This is how the module knows when it can start another process and respect its maximum. (It is also used by a few other routines that wait for processes. ). And this is whenrun_on_finish
gets triggered, by$s->on_finish( $kid, ... )
The callback is in the coderef
$code
, retrieved from the object'son_finish
key, which itself is set in the subrun_on_finish
. This is how the callback is set up, once that sub runs.The methods availed to the user for this are
wait_all_children
andreap_finished_children
.Since none of this is used in the posted code the
$number_running
is not getting updated sowhile
is an infinite loop. Recall that the variable$number_running
in the parent cannot be directly changed by child processes.