I'm doing a lot of HTTP requests and I chose HTTP::Async to do the job. I've over 1000 requests to make, and if I simply do the following (see code below), a lot of requests time out by the time they get processed because it can take tens of minutes before processing gets to them:
for my $url (@urls) {
$async->add(HTTP::Request->new(GET => $url));
}
while (my $resp = $async->wait_for_next_response) {
# use $resp
}
So I decided to do 25 requests per time, but I can't think of a way to express it in code.
I tried the following:
while (1) {
L25:
for (1..25) {
my $url = shift @urls;
if (!defined($url)) {
last L25;
}
$async->add(HTTP::Request->new(GET => $url));
}
while (my $resp = $async->wait_for_next_response) {
# use $resp
}
}
This however doesn't work well as because it's too slow now. Now it waits until all 25 requests have been processed until it adds another 25. So if it has 2 requests left, it does nothing. I've to wait for all requests to be processed to add the next batch of 25.
How could I improve this logic to make $async
do something while I process records, but also make sure they don't time out.
You're close, you just need to combine the two approaches! :-)
Untested, so think of it as pseudo code. In particular I am not sure if total_count
is the right method to use, the documentation doesn't say. You could also just have an $active_requests
counter that you ++
when adding a request and --
when you get a response.
while (1) {
# if there aren't already 25 requests "active", then add more
while (@urls and $async->total_count < 25) {
my $url = shift @urls;
$async->add( ... );
}
# deal with any finished requests right away, we wait for a
# second just so we don't spin in the main loop too fast.
while (my $response = $async->wait_for_next_response(1)) {
# use $response
}
# finish the main loop when there's no more work
last unless ($async->total_count or @urls);
}
If you can't call wait_for_next_response
fast enough because you're in the middle of executing other code, the simplest solution is to make the code interruptable by moving it to a separate thread of execution. But if you're going to start using threads, why use HTTP::Async?
use threads;
use Thread::Queue::Any 1.03;
use constant NUM_WORKERS => 25;
my $req_q = Thread::Queue::Any->new();
my $res_q = Thread::Queue::Any->new();
my @workers;
for (1..NUM_WORKERS) {
push @workers, async {
my $ua = LWP::UserAgent->new();
while (my $req = $req_q->dequeue()) {
$res_q->enqueue( $ua->request($req) );
}
};
}
for my $url (@urls) {
$req_q->enqueue( HTTP::Request->new( GET => $url ) );
}
$req_q->enqueue(undef) for @workers;
for (1..@urls) {
my $res = $res_q->dequeue();
...
}
$_->join() for @workers;