Perl WWW::Mechanize (or LWP) get redirect url

2019-01-25 12:14发布

So I am using WWW::Mechanize to crawl sites. It works great, except if I request a url such as:

http://www.levi.com/

I am redirected to:

http://us.levi.com/home/index.jsp

And for my script I need to know that this redirect took place and what the url I was redirected to is. Is there anyway to detect this with WWW::Mechanize or LWP and then get the redirected url? Thanks!

2条回答
时光不老,我们不散
2楼-- · 2019-01-25 12:34
use strict;
use warnings;
use URI;
use WWW::Mechanize;

my $url = 'http://...';
my $mech = WWW::Mechanize->new(autocheck => 0);
$mech->max_redirect(0);
$mech->get($url);

my $status = $mech->status();
if (($status >= 300) && ($status < 400)) {
  my $location = $mech->response()->header('Location');
  if (defined $location) {
    print "Redirected to $location\n";
    $mech->get(URI->new_abs($location, $mech->base()));
  }
}

If the status code is 3XX, then you should check response headers for redirection url.

查看更多
3楼-- · 2019-01-25 12:41

You can also get to the same place by inspecting the redirects() method on the response object.

use strict;
use warnings;
use feature qw( say );

use WWW::Mechanize;

my $ua = WWW::Mechanize->new;
my $res = $ua->get('http://metacpan.org');

my @redirects = $res->redirects;
say 'request uri: ' . $redirects[-1]->request->uri;
say 'location header: ' . $redirects[-1]->header('Location');

Prints:

request uri: http://metacpan.org
location header: https://metacpan.org/

See https://metacpan.org/pod/HTTP::Response#$r-%3Eredirects Keep in mind that more than one redirect may have taken you to your current location. So you may want to inspect every response which is returned via redirects().

查看更多
登录 后发表回答