Working solution at bottom of description!
I am running PHP 5.4, and trying to get the headers of a list of URLs.
For the most part, everything is working fine, but there are three URLs that are causing issues (and likely more, with more extensive testing).
'http://www.alealimay.com'
'http://www.thelovelist.net'
'http://www.bleedingcool.com'
All three sites work fine in a browser, and produce the following header responses:
(From Safari)
Note that all three header responses are Code = 200
But retrieving the headers via PHP, using get_headers
...
stream_context_set_default(array('http' => array('method' => "HEAD")));
$headers = get_headers($url, 1);
stream_context_set_default(array('http' => array('method' => "GET")));
... returns the following:
url ...... "http://www.alealimay.com"
headers
| 0 ............................ "HTTP/1.0 400 Bad Request"
| content-length ............... "378"
| X-Synthetic .................. "true"
| expires ...................... "Thu, 01 Jan 1970 00:00:00 UTC"
| pragma ....................... "no-cache"
| cache-control ................ "no-cache, must-revalidate"
| content-type ................. "text/html; charset=UTF-8"
| connection ................... "close"
| date ......................... "Wed, 24 Aug 2016 01:26:21 UTC"
| X-ContextId .................. "QIFB0I8V/xsTFMREg"
| X-Via ........................ "1.0 echo109"
url ...... "http://www.thelovelist.net"
headers
| 0 ............................ "HTTP/1.0 400 Bad Request"
| content-length ............... "378"
| X-Synthetic .................. "true"
| expires ...................... "Thu, 01 Jan 1970 00:00:00 UTC"
| pragma ....................... "no-cache"
| cache-control ................ "no-cache, must-revalidate"
| content-type ................. "text/html; charset=UTF-8"
| connection ................... "close"
| date ......................... "Wed, 24 Aug 2016 01:26:22 UTC"
| X-ContextId .................. "aNKvf2RB/bIMjWyjW"
| X-Via ........................ "1.0 echo103"
url ...... "http://www.bleedingcool.com"
headers
| 0 ............................ "HTTP/1.1 403 Forbidden"
| Server ....................... "Sucuri/Cloudproxy"
| Date ......................... "Wed, 24 Aug 2016 01:26:22 GMT"
| Content-Type ................. "text/html"
| Content-Length ............... "5311"
| Connection ................... "close"
| Vary ......................... "Accept-Encoding"
| ETag ......................... "\"57b7f28e-14bf\""
| X-XSS-Protection ............. "1; mode=block"
| X-Frame-Options .............. "SAMEORIGIN"
| X-Content-Type-Options ....... "nosniff"
| X-Sucuri-ID .................. "11005"
This is the case regardless of changing the stream_context
//stream_context_set_default(array('http' => array('method' => "HEAD")));
$headers = get_headers($url, 1);
//stream_context_set_default(array('http' => array('method' => "GET")));
Produces the same result.
No warnings or errors are thrown for any of these (normally have the errors suppressed with @get_headers
, but there is no difference either way).
I have checked my php.ini
, and have allow_url_fopen
set to On
.
I am headed towards stream_get_meta_data
, and am not interested in CURL
solutions. stream_get_meta_data
(and its accompanying fopen
) will fail in the same spot as get_headers
, so fixing one will fix both in this case.
Usually, if there are redirects, the output looks like:
url ...... "http://www.startingURL.com/"
headers
| 0 ............................ "HTTP/1.1 301 Moved Permanently"
| 1 ............................ "HTTP/1.1 200 OK"
| Date
| | "Wed, 24 Aug 2016 02:02:29 GMT"
| | "Wed, 24 Aug 2016 02:02:32 GMT"
|
| Server
| | "Apache"
| | "Apache"
|
| Location ..................... "http://finishingURL.com/"
| Connection
| | "close"
| | "close"
|
| Content-Type
| | "text/html; charset=UTF-8"
| | "text/html; charset=UTF-8"
|
| Link ......................... "; rel=\"https://api.w.org/\", ; rel=shortlink"
How come the sites work in browsers, but fail when using get_headers
?
There are various SO posts discussing the same thing, but the solution for all of them doesn't pertain to this case:
POST
requires Content-Length
(I'm sending a HEAD
request, no content is returned)
URL contains UTF-8 data (The only chars in these URLs are all from the Latin alphabet)
Cannot send a URL with spaces in it (These URLs are all space-free, and very ordinary in every way)
Solution!
(Thanks to Max in the answers below for pointing me on the right track.)
The issue is because there is no pre-defined user_agent
, without either setting on in php.ini
, or declaring it in code.
So, I change the user_agent
to mimic a browser, do the deed, and then revert it back to stating value (likely blank).
$OriginalUserAgent = ini_get('user_agent');
ini_set('user_agent', 'Mozilla/5.0');
$headers = @get_headers($url, 1);
ini_set('user_agent', $OriginalUserAgent);
User agent change found here.