How can I check if a URL exists via PHP?

2020-01-22 11:55发布

How do I check if a URL exists (not 404) in PHP?

标签: php url
21条回答
We Are One
2楼-- · 2020-01-22 12:06

I run some tests to see if links on my site are valid - alerts me to when third parties change their links. I was having an issue with a site that had a poorly configured certificate that meant that php's get_headers didn't work.

SO, I read that curl was faster and decided to give that a go. then i had an issue with linkedin which gave me a 999 error, which turned out to be a user agent issue.

I didn't care if the certificate was not valid for this test, and i didn't care if the response was a re-direct.

Then I figured use get_headers anyway if curl was failing....

Give it a go....

/**
 * returns true/false if the $url is present.
 *
 * @param string $url assumes this is a valid url.
 *
 * @return bool
 */
private function url_exists (string $url): bool
{
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_NOBODY, TRUE);             // this does a head request to make it faster.
  curl_setopt($ch, CURLOPT_HEADER, TRUE);             // just the headers
  curl_setopt($ch, CURLOPT_SSL_VERIFYSTATUS, FALSE);  // turn off that pesky ssl stuff - some sys admins can't get it right.
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
  // set a real user agent to stop linkedin getting upset.
  curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36');
  curl_exec($ch);
  $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
  if (($http_code >= HTTP_OK && $http_code < HTTP_BAD_REQUEST) || $http_code === 999)
  {
    curl_close($ch);
    return TRUE;
  }
  $error = curl_error($ch); // used for debugging.
  curl_close($ch);
  // just try the get_headers - it might work!
  stream_context_set_default(array('http' => array('method' => 'HEAD')));
  $file_headers = @get_headers($url);
  if ($file_headers)
  {
    $response_code = substr($file_headers[0], 9, 3);
    return $response_code >= 200 && $response_code < 400;
  }
  return FALSE;
}
查看更多
Deceive 欺骗
3楼-- · 2020-01-22 12:11

karim79's get_headers() solution didn't worked for me as I gotten crazy results with Pinterest.

get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(): Failed to enable crypto

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
)

get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed

Array
(
    [url] => https://www.pinterest.com/jonathan_parl/
    [exists] => 
) 

Anyway, this developer demonstrates that cURL is way faster than get_headers():

http://php.net/manual/fr/function.get-headers.php#104723

Since many people asked for karim79 to fix is cURL solution, here's the solution I built today.

/**
* Send an HTTP request to a the $url and check the header posted back.
*
* @param $url String url to which we must send the request.
* @param $failCodeList Int array list of code for which the page is considered invalid.
*
* @return Boolean
*/
public static function isUrlExists($url, array $failCodeList = array(404)){

    $exists = false;

    if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){

        $url = "https://" . $url;
    }

    if (preg_match(RegularExpression::URL, $url)){

        $handle = curl_init($url);


        curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);

        curl_setopt($handle, CURLOPT_HEADER, true);

        curl_setopt($handle, CURLOPT_NOBODY, true);

        curl_setopt($handle, CURLOPT_USERAGENT, true);


        $headers = curl_exec($handle);

        curl_close($handle);


        if (empty($failCodeList) or !is_array($failCodeList)){

            $failCodeList = array(404); 
        }

        if (!empty($headers)){

            $exists = true;

            $headers = explode(PHP_EOL, $headers);

            foreach($failCodeList as $code){

                if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){

                    $exists = false;

                    break;  
                }
            }
        }
    }

    return $exists;
}

Let me explains the curl options:

CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.

CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate

CURLOPT_HEADER: include the header in the string

CURLOPT_NOBODY: don't include the body in the string

CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)


Additional note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:

const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini

Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)

Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.

And, of course, here's the unit test (including all the popular social network) to legitimates my coding:

public function testIsUrlExists(){

//invalid
$this->assertFalse(ToolManager::isUrlExists("woot"));

$this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));

$this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));

$this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));

$this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));

$this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));

$this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));

$this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));


//valid
$this->assertTrue(ToolManager::isUrlExists("www.google.ca"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));

$this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));

$this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));

$this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));

$this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));

$this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
}

Great success to all,

Jonathan Parent-Lévesque from Montreal

查看更多
祖国的老花朵
4楼-- · 2020-01-22 12:12

to check if url is online or offline ---

function get_http_response_code($theURL) {
    $headers = @get_headers($theURL);
    return substr($headers[0], 9, 3);
}
查看更多
在下西门庆
5楼-- · 2020-01-22 12:12

get_headers() returns an array with the headers sent by the server in response to a HTTP request.

$image_path = 'https://your-domain.com/assets/img/image.jpg';

$file_headers = @get_headers($image_path);
//Prints the response out in an array
//print_r($file_headers); 

if($file_headers[0] == 'HTTP/1.1 404 Not Found'){
   echo 'Failed because path does not exist.</br>';
}else{
   echo 'It works. Your good to go!</br>';
}
查看更多
不美不萌又怎样
6楼-- · 2020-01-22 12:14
function urlIsOk($url)
{
    $headers = @get_headers($url);
    $httpStatus = intval(substr($headers[0], 9, 3));
    if ($httpStatus<400)
    {
        return true;
    }
    return false;
}
查看更多
我欲成王,谁敢阻挡
7楼-- · 2020-01-22 12:15

kind of an old thread, but.. i do this:

$file = 'http://www.google.com';
$file_headers = @get_headers($file);
if ($file_headers) {
    $exists = true;
} else {
    $exists = false;
}
查看更多
登录 后发表回答