Can't fork more than 200 processes, sometimes,

2019-06-04 12:45发布

Here's the guts of the program using Parallel::ForkManager. It seems to stop at 200 proccesses, sometimes its around 30, depending on the size of the pgsql query that collects URLs to send to Mojo::UserAgent. There seems to be some hard limits somewhere? Is there a better way to write this so that I don't run into those limits? The machine its running on has 16 CPUs and 128GB of memory, so it can certainly run more than 200 proccesses that will die after the Mojo::UserAgent timeout, which is generally 2 seconds.

use Parallel::ForkManager;
use Mojo::Base-strict;
use Mojo::UserAgent;
use Mojo::Pg;
use Math::Random::Secure qw(rand irand);
use POSIX qw(strftime);
use Socket;
use GeoIP2::Database::Reader;
use File::Spec::Functions qw(:ALL);
use File::Basename qw(dirname);

use feature 'say';


$max_kids = 500;
sub do_auth {
...
        push( @url, $authurl );
}


do_auth();

my $pm = Parallel::ForkManager->new($max_kids);

LINKS:
foreach my $linkarray (@url) {
    $pm->start and next LINKS;    # do the fork
    my $ua = Mojo::UserAgent->new( max_redirects => 5, timeout => $timeout );
    $ua->get($url);
    $pm->finish;
}

$pm->wait_all_children;

2条回答
三岁会撩人
2楼-- · 2019-06-04 13:18

For your example code (fetching a URL) I would never use Forkmanager. I would use Mojo::IOLoop::Delay or non-blocking calling style.

use Mojo::UserAgent;
use feature 'say';

my $ua = Mojo::UserAgent->new;

$ua->inactivity_timeout(15);
$ua->connect_timeout(15);
$ua->request_timeout(15);
$ua->max_connections(0);

my @url = ("http://stackoverflow.com/questions/41253272/joining-a-view-and-a-table-in-mysql",
           "http://stackoverflow.com/questions/41252594/develop-my-own-website-builder",
           "http://stackoverflow.com/questions/41251919/chef-mysql-server-configuration",
           "http://stackoverflow.com/questions/41251689/sql-trigger-update-error",
           "http://stackoverflow.com/questions/41251369/entity-framework-how-to-add-complex-objects-to-db",
           "http://stackoverflow.com/questions/41250730/multi-dimensional-array-from-matching-mysql-columns",
           "http://stackoverflow.com/questions/41250528/search-against-property-in-json-object-using-mysql-5-6",
           "http://stackoverflow.com/questions/41249593/laravel-time-difference",
           "http://stackoverflow.com/questions/41249364/variable-not-work-in-where-clause-php-joomla");

foreach my $linkarray (@url) {
    # Run all requests at the same time
    $ua->get($linkarray => sub {
    my ($ua, $tx) = @_;
    say $tx->res->dom->at('title')->text;
   });
}
Mojo::IOLoop->start unless Mojo::IOLoop->is_running;
查看更多
乱世女痞
3楼-- · 2019-06-04 13:31

Most likely you are running into an operating system limit on threads or processes. The quick and dirty way to fix this would be to increase the limit, which is usually configurable. That said, rewriting the code not to use so many short lived threads is a more scalable solution.

查看更多
登录 后发表回答