I have used simple_html_dom library but i can not get HTML content only for 1 URL but i am getting 503 error. Check my below code.
$base = 'http://www.amazon.com/gp/offer-listing/B001F0M4K8/ref=dp_olp_all_mbc/183-8463780-9861412?ie=UTF8&condition=new';
echo $html = file_get_html($base);
Error : Warning: file_get_contents(http://www.amazon.com/gp/offer-listing/B001F0M4K8/ref=dp_olp_all_mbc/183-8463780-9861412?ie=UTF8&condition=new) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 503 Service Unavailable in D:\xampp\htdocs\webcrawler-amazon\webcrawler-amazon\simple_html_dom.php on line 76
I am stuck here so please help me.
I am doing the same, they are sending the following to you. Sometimes, you can get by it.
I think, server just blocks your request, you will not be able to fetch data from it, using simple HTTP requests.
You can try using curl, proxies, or both (there are ready to use solutions for this, like: AngryCurl, or RollingCurl)
I recommand you to do this with cURL : http://php.net/manual/en/book.curl.php
You can use it with PHP or in command line. There is tons of example online.
It's the Amazon's anti-bots defense system.
The returned page starts with the following HTML comment:
You need to either mimic very well the behaviour of a real customer using a browser or ask them about an approved way to get data from their systems automatically. Using an API is better (and easier) than scrapping web pages, anyway.