Perl WWW::Mechanize (or LWP) get redirect url

2019-01-25 12:06发布

问题:

So I am using WWW::Mechanize to crawl sites. It works great, except if I request a url such as:

http://www.levi.com/

I am redirected to:

http://us.levi.com/home/index.jsp

And for my script I need to know that this redirect took place and what the url I was redirected to is. Is there anyway to detect this with WWW::Mechanize or LWP and then get the redirected url? Thanks!

回答1:

use strict;
use warnings;
use URI;
use WWW::Mechanize;

my $url = 'http://...';
my $mech = WWW::Mechanize->new(autocheck => 0);
$mech->max_redirect(0);
$mech->get($url);

my $status = $mech->status();
if (($status >= 300) && ($status < 400)) {
  my $location = $mech->response()->header('Location');
  if (defined $location) {
    print "Redirected to $location\n";
    $mech->get(URI->new_abs($location, $mech->base()));
  }
}

If the status code is 3XX, then you should check response headers for redirection url.



回答2:

You can also get to the same place by inspecting the redirects() method on the response object.

use strict;
use warnings;
use feature qw( say );

use WWW::Mechanize;

my $ua = WWW::Mechanize->new;
my $res = $ua->get('http://metacpan.org');

my @redirects = $res->redirects;
say 'request uri: ' . $redirects[-1]->request->uri;
say 'location header: ' . $redirects[-1]->header('Location');

Prints:

request uri: http://metacpan.org
location header: https://metacpan.org/

See https://metacpan.org/pod/HTTP::Response#$r-%3Eredirects Keep in mind that more than one redirect may have taken you to your current location. So you may want to inspect every response which is returned via redirects().