How can I get the destination URL using cURL?

2019-01-10 23:22发布

问题:

How can I get the destination URL using cURL when the HTTP status code is 302?

<?PHP
$url = "http://www.ecs.soton.ac.uk/news/";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$html = curl_exec($ch);
$status_code = curl_getinfo($ch,CURLINFO_HTTP_CODE);

if($status_code=302 or $status_code=301){
  $url = "";
  // I want to to get the destination url
}
curl_close($ch);
?>

回答1:

You can use:

echo curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);


回答2:

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE); // We'll parse redirect url from header.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE); // We want to just get redirect url but not to follow it.
$response = curl_exec($ch);
preg_match_all('/^Location:(.*)$/mi', $response, $matches);
curl_close($ch);
echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';


回答3:

You have to grab the Location header for the redirected URL.



回答4:

A bit dated of a response but wanted to show a full working example, some of the solutions out there are pieces:

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url); //set url
    curl_setopt($ch, CURLOPT_HEADER, true); //get header
    curl_setopt($ch, CURLOPT_NOBODY, true); //do not include response body
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); //do not show in browser the response
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); //follow any redirects
    curl_exec($ch);
    $new_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); //extract the url from the header response
    curl_close($ch);

This works with any redirects such as 301 or 302, however on 404's it will just return the original url requested (since it wasn't found). This can be used to update or remove links from your site. This was my need anyway.



回答5:

The new destination for a 302 redirect ist located in the http header field "location". Example:

HTTP/1.1 302 Found
Date: Tue, 30 Jun 2002 1:20:30 GMT
Server: Apache
Location: http://www.foobar.com/foo/bar
Content-Type: text/html; charset=iso-8859-1

Just grep it with a regex.

To include all HTTP header information include it to the result with the curl option CURLOPT_HEADER. Set it with:

curl_setopt($c, CURLOPT_HEADER, true);

If you simply want curl to follow the redirection use CURLOPT_FOLLOWLOCATION:

curl_setopt($c, CURLOPT_FOLLOWLOCATION, true);

Anyway, you shouldn't use the new URI because HTTP Statuscode 302 is only a temporary redirect.



回答6:

In response to user437797's comment on Tamik Soziev's answer (I unfortunately do not have the reputation to comment there directly) :

The CURLINFO_EFFECTIVE_URL works fine, but for it to do as op wants you also have to set CURLOPT_FOLLOWLOCATION to TRUE of course. This is because CURLINFO_EFFECTIVE_URL returns exactly what it says, the effective url that ends up getting loaded. If you don't follow redirects then this will be your requested url, if you do follow redirects then it will be the final url that is redirected to.

The nice thing about this approach is that it also works with multiple redirects, whereas when retrieving and parsing the HTTP header yourself you may have to do that multiple times before the final destination url is exposed.

Also note that the max number of redirects that curl follows can be controlled via CURLOPT_MAXREDIRS. By default it is unlimited (-1) but this may get you into trouble if someone (perhaps intentionally) configured and endless redirect loop for some url.



回答7:

Here's a way to get all headers returned by a curl http request, as well as the status code and an array of header lines for each header.

$url = 'http://google.com';
$opts = array(CURLOPT_URL => $url,
              CURLOPT_RETURNTRANSFER => true,
              CURLOPT_HEADER => true,
              CURLOPT_FOLLOWLOCATION => true);

$ch = curl_init();
curl_setopt_array($ch, $opts);
$return = curl_exec($ch);
curl_close($ch);

$headers = http_response_headers($return);
foreach ($headers as $header) {
    $str = http_response_code($header);
    $hdr_arr = http_response_header_lines($header);
    if (isset($hdr_arr['Location'])) {
        $str .= ' - Location: ' . $hdr_arr['Location'];
    }
    echo $str . '<br />';
}

function http_response_headers($ret_str)
{
    $hdrs = array();
    $arr = explode("\r\n\r\n", $ret_str);
    foreach ($arr as $each) {
        if (substr($each, 0, 4) == 'HTTP') {
            $hdrs[] = $each;
        }
    }
    return $hdrs;
}

function http_response_header_lines($hdr_str)
{
    $lines = explode("\n", $hdr_str);
    $hdr_arr['status_line'] = trim(array_shift($lines));
    foreach ($lines as $line) {
        list($key, $val) = explode(':', $line, 2);
        $hdr_arr[trim($key)] = trim($val);
    }
    return $hdr_arr;
}

function http_response_code($str)
{
    return substr(trim(strstr($str, ' ')), 0, 3);
}


回答8:

Use curl_getinfo($ch), and the first element (url) would indicate the effective URL.



标签: php html http curl