So I have a perl script that goes out there and wgets pieces of a stream (I don't know how many pieces there are upfront)
but I can't think of a good way to know when to STOP wget'ing. right now if wget returns unsuccessful, than we create a file called "end" and once the main program sees it, it stops the loop. Is there a better way to go about doing this?
Obviously it would be easy if it was done sequentially instead of concurrently, but i was trying to make it download the fastest.
my $link = $ARGV[0];
my ($url) = $link=~ m/(.+-)\d+.ts/i;
my $num = 0;
#while the file END doesn't exist
my @pids;
while (! -e "END") {
#create the URL, increment by 1
my $video=$url.++$num.".ts";
die "could not fork" unless defined (my $pid = fork());
#child process goes until wget returns invalid, create END
if (not $pid) {
system ("wget -T 5 -t 5 $video");
`touch END` if $? != 0;
exit;
}
push @pids, $pid;
}
#parent process still running, waiting for the same END file.
for my $pid (@pids) { waitpid $pid,0; }
print "pids finished\n";
sleep 1;
`rm END`;
You don't indicate how many processes there may be, but no resource is unlimited and you should limit the number or you'll see a rapid degradation of performance as you reach saturation.
This is even more so when going out on the network since you may be annoying a server (and things will also stop being faster quite soon). Perhaps run up to a few tens of processes at a time?
Then one option is to limit a number of parallel downloads using Parallel::ForkManager. It has a way to return data to parent, so a child can report failure. Then its
run_on_finish
method can check each batch for such a flag (of failure), and set a variable that controls the forking.This stops forking after the batch of jobs within which
$i == 3
. Add prints for diagnostics.The "callback"
run_on_finish
runs only once a whole batch completes.† The anonymous sub in it always receives 6 arguments, but only the first one, the child pid, is always defined. The last one has data possibly passed by the child, and when that happens we set the flag. A child can return data by passing a reference tofinish
method. To only indicate a condition we can simply pass anything. I use\$ret
as an example of passing actual data.See documentation for more, but this does what you ask. For yet far more see Forks::Super.
If you wish to fork as you do, I'd first put in a little
sleep
there, so you don't bombard the server with too many requests. Your children can talk with the parent using socketpair. The failed child can write while all others can simply close their socket. The parent keeps checking, for example withcan_read
from IO::Select. There is an example in perlipc. Since you only need children to write to the parent the pipe would suffice as well.You can also do it with a signal. The child that fails sends (say)
SIGUSR1
to the parent, which the parent traps and sets a global variable that controls further forks. This is simpler as the parent only traps that one signal and doesn't care where it comes from. See perlipc and sigtrap pragma.You can also use a file, much like you do, which is probably simplest since here you don't care about racing issues (whether children writes overlap), but only about an empty file showing up.
However, in all these you'd also want to limit the number of parallel processes.
Finally, there are also modules that help with external commands, for example IPC::Run.
† To run the callback right as each child exits use reap_finished_children. See this post.