Perl Mechanize timeout not working with https

2019-09-04 09:03发布

问题:

I've been using Perl's Mechanize library but for some reason with https the timeout parameter (I'm using Crypt::SSLeay for SSL).

my $browser = WWW::Mechanize->new(autocheck=>0, timeout=>3);

Has anyone encountered this before and knows how to fix it? Thanks!

回答1:

For HTPS/SSL you have to do some workaround:

my $html = `wget -q -t 1 -T $timeout -O - $url`;
mech->get(0); 
$mech->update_html($html);


回答2:

In just testing it now against https://www.sourceforge.net/, I get the impression that the timeout argument does work, but that it doesn't work until after the HTTPS negotiation occurs. I set the timeout really low, to a fractional value, and it reports a timeout correctly, but there is a delay much longer than my timeout value, and then it immediately returns with a timeout error.

Example:

#!/usr/bin/perl

use strict;
use warnings;
$|=1;

# This "works", downloading the page within the timeout period
use WWW::Mechanize;
my $mech = WWW::Mechanize->new(
    timeout => 3,
);
$mech->get( 'https://www.sourceforge.net/' );
print "Successful get.\n";

# This throws a connect timeout, but after a delay much longer than 50ms
my $mech2 = WWW::Mechanize->new(
    timeout => 0.05,
);
$mech2->get( 'https://www.sourceforge.net/' );
print "Successful get 2.\n";

Output:

Successful get.
Error GETing http://sourceforge.net/: Can't connect to sourceforge.net:80
(connect: timeout) at ./throwaway22855.pl line 20

It appears the timeout is handled deep down below in IO::Socket, using select. On some systems, this may interfere with SIGALRM, so if you want to work around this and write your own timeout, make sure you read your platform's implementation docs. Also note (in perldoc perlipc) that Perl has used deferred signals since 5.8.x, so setting an alarm by hand may not work without using the sigprocmask workaround.

There is some more information here: SIGALRM Timeout -- How does it affect existing operations?