I'm using PhantomJS to get page content for given URL. The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:
error code 5, Operation canceled
Web page on which I can reproduce this problem is www.lifehacker.com The resources I can not get are:
- http://x.kinja-static.com/assets/stylesheets/tiger-4ee27d6612a71ee3c68440f8e9c0025c.css
- http://c.amazon-adsystem.com/aax2/amzn_ads.js
- and some others too...
The command I'm running is:
phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com
and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.
The fetchpage.js script is:
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
if (system.args.length === 1) {
console.log('Usage: fetchpage.js <some URL>');
phantom.exit(1);
}
var url = system.args[1];
page.open(url, function (status) {
console.log("STATUS: " + status);
if (status !== 'success') {
console.log(
"Error opening url \"" + page.reason_url
+ "\": " + page.reason
+ "\": " + page
);
phantom.exit(1);
} else {
var content = page.content;
console.log(content);
phantom.exit(1);
}
});
If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.
I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.
I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.
What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:
...
var page = webPage.create();
page.customHeaders = {
"Cache-Control":"no-cache",
"Pragma":"no-cache"
};
page.open(url, function (status) {
...
but nothing changed.
I am running out of ideas..