I've been using Perl's Mechanize library but for some reason with https the timeout parameter (I'm using Crypt::SSLeay for SSL).
my $browser = WWW::Mechanize->new(autocheck=>0, timeout=>3);
Has anyone encountered this before and knows how to fix it? Thanks!
For HTPS/SSL you have to do some workaround:
my $html = `wget -q -t 1 -T $timeout -O - $url`;
mech->get(0);
$mech->update_html($html);
In just testing it now against https://www.sourceforge.net/, I get the impression that the timeout argument does work, but that it doesn't work until after the HTTPS negotiation occurs. I set the timeout really low, to a fractional value, and it reports a timeout correctly, but there is a delay much longer than my timeout value, and then it immediately returns with a timeout error.
Example:
#!/usr/bin/perl
use strict;
use warnings;
$|=1;
# This "works", downloading the page within the timeout period
use WWW::Mechanize;
my $mech = WWW::Mechanize->new(
timeout => 3,
);
$mech->get( 'https://www.sourceforge.net/' );
print "Successful get.\n";
# This throws a connect timeout, but after a delay much longer than 50ms
my $mech2 = WWW::Mechanize->new(
timeout => 0.05,
);
$mech2->get( 'https://www.sourceforge.net/' );
print "Successful get 2.\n";
Output:
Successful get.
Error GETing http://sourceforge.net/: Can't connect to sourceforge.net:80
(connect: timeout) at ./throwaway22855.pl line 20
It appears the timeout is handled deep down below in IO::Socket, using select
. On some systems, this may interfere with SIGALRM
, so if you want to work around this and write your own timeout, make sure you read your platform's implementation docs. Also note (in perldoc perlipc
) that Perl has used deferred signals since 5.8.x, so setting an alarm by hand may not work without using the sigprocmask
workaround.
There is some more information here:
SIGALRM Timeout -- How does it affect existing operations?