I found out why this was happening, check my answer
This is the only domain that this happens on, I'm running curl_multi on a bunch of URLs, this one comes back with 404 http_code http://www.breakingnews.com
But when I visit it in the browser it's 200OK (takes a while to load) and doesn't even look like a redirect.
Anyone know what's up? Is this a common problem
here's a var_dump:
["info"]=> array(22) { ["url"]=> string(27) "http://www.breakingnews.com" ["content_type"]=> string(24) "text/html; charset=utf-8" ["http_code"]=> int(404) ["header_size"]=> int(337) ["request_size"]=> int(128) ["filetime"]=> int(-1) ["ssl_verify_result"]=> int(0) ["redirect_count"]=> int(0) ["total_time"]=> float(1.152229) ["namelookup_time"]=> float(0.001261) ["connect_time"]=> float(0.020121) ["pretransfer_time"]=> float(0.020179) ["size_upload"]=> float(0) ["size_download"]=> float(9755) ["speed_download"]=> float(8466) ["speed_upload"]=> float(0) ["download_content_length"]=> float(-1) ["upload_content_length"]=> float(0) ["starttransfer_time"]=> float(1.133522) ["redirect_time"]=> float(0) ["certinfo"]=> array(0) { } ["redirect_url"]=> string(0) "" } ["error"]=> string(0) ""
UPDATE:
This actually looks like a php bug with curl_setopt($ch, CURLOPT_NOBODY, true);
https://bugs.php.net/bug.php?id=39611
EDIT: It's not a bug.
I found the answer in a comment here http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/comment-page-1/#comment-12186
By setting CURLOPT_NOBODY to true, CURL will use HEAD for the request, which some servers don’t like (for example, forbes) and will return “Emply reply from server”. To fix you need to also set CURLOPT_HTTPGET to reset back to GET request.
/* don’t download the page, just the header (much faster in this case) */
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPGET, true); //this is needed to fix the issue
Am not sure how your code looks like but this works fine
$url = "http://www.breakingnews.com";
$ch = curl_init ( $url );
curl_setopt ( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; pl; rv:1.9) Gecko/2008052906 Firefox/3.0" );
curl_setopt ( $ch, CURLOPT_AUTOREFERER, true );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, 1 );
curl_exec ( $ch );
var_dump ( curl_getinfo ( $ch ) );
if (curl_errno ( $ch )) {
print curl_error ( $ch );
} else {
curl_close ( $ch );
}
Output
array
'url' => string 'http://www.breakingnews.com' (length=27)
'content_type' => string 'text/html; charset=utf-8' (length=24)
'http_code' => int 200
'header_size' => int 330
'request_size' => int 154
'filetime' => int -1
'ssl_verify_result' => int 0
'redirect_count' => int 0
'total_time' => float 4.243
'namelookup_time' => float 0.171
'connect_time' => float 0.374
'pretransfer_time' => float 0.374
'size_upload' => float 0
'size_download' => float 68638
'speed_download' => float 16176
'speed_upload' => float 0
'download_content_length' => float -1
'upload_content_length' => float 0
'starttransfer_time' => float 3.681
'redirect_time' => float 0
'certinfo' =>
array
empty
'redirect_url' => string '' (length=0)