For some reason I can't seem to get this particular web page's contents via cURL. I've managed to use cURL to get to the "top level page" contents fine, but the same self-built quick cURL function doesn't seem to work for one of the linked off sub web pages.
Top level page: http://www.deindeal.ch/
A sub page: http://www.deindeal.ch/deals/hotel-cristal-in-nuernberg-30/
My cURL function (in functions.php)
function curl_get($url) {
$ch = curl_init();
$header = array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Accept-Language: en-us;q=0.8,en;q=0.6'
);
$options = array(
CURLOPT_URL => $url,
CURLOPT_HEADER => 0,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13',
CURLOPT_HTTPHEADER => $header
);
curl_setopt_array($ch, $options);
$return = curl_exec($ch);
curl_close($ch);
return $return;
}
PHP file to get the contents (using echo for testing)
require "functions.php";
require "phpQuery.php";
echo curl_get('http://www.deindeal.ch/deals/hotel-walliserhof-zermatt-2-naechte-30/');
So far I've attempted the following to get this to work
- Ran the file both locally (XAMPP) and remotely (LAMP).
- Added in the user-agent and HTTP headers as recommended here file_get_contents and CURL can't open a specific website - before the function
curl_get()
contained all the options as current, except for CURLOPT_USERAGENTand
CURLOPT_HTTPHEADERS`.
Is it possible for a website to completely block requests via cURL or other remote file opening mechanisms, regardless of how much data is supplied to attempt to make a real browser request?
Also, is it possible to diagnose why my requests are turning up with nothing?
Any help answering the above two questions, or editing/making suggestions to get the file's contents, even if through a method different than cURL would be greatly appreciated ;).
Try adding:
to your options.
If you run a simple curl request from the command line (including a
-i
to see the response headers) then it is pretty easy to see:As you can see, it returns a 302 with a Location header. If you hit that location directly, you will get the content you are looking for.
And to answer your two questions:
EDIT
Ah, I see what you are talking about now. So, when you go to that link for the first time you are redirected and a cookie (or cookies) is set. Once you have those cookie, your request goes through as intended.
So, you need to use a cookiejar, like in this example: http://icfun.blogspot.com/2009/04/php-how-to-use-cookie-jar-with-curl.html
So, you will need to make an initial request, save the cookies, and make your subsequent requests including the cookies after that.