Unable to load page resources with PhantomJS

2019-03-21 08:13发布

问题:

I'm using PhantomJS to get page content for given URL. The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:

error code 5, Operation canceled

Web page on which I can reproduce this problem is www.lifehacker.com The resources I can not get are:

  • http://x.kinja-static.com/assets/stylesheets/tiger-4ee27d6612a71ee3c68440f8e9c0025c.css
  • http://c.amazon-adsystem.com/aax2/amzn_ads.js
  • and some others too...

The command I'm running is:

phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com

and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.

The fetchpage.js script is:

var webPage = require('webpage');
var system = require('system');
var page = webPage.create();

if (system.args.length === 1) {
  console.log('Usage: fetchpage.js <some URL>');
  phantom.exit(1);
}

var url = system.args[1];

page.open(url, function (status) {

  console.log("STATUS: " + status);

  if (status !== 'success') {
    console.log(
      "Error opening url \"" + page.reason_url
      + "\": " + page.reason
      + "\": " + page
    );
    phantom.exit(1);
  } else {
    var content = page.content;
    console.log(content);
    phantom.exit(1);
  }
});

If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.

I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.

I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.

What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:

...
var page = webPage.create();
page.customHeaders = {
    "Cache-Control":"no-cache",
    "Pragma":"no-cache"
};
page.open(url, function (status) {
  ...

but nothing changed.

I am running out of ideas..

回答1:

For coders who come across this page during their quest to find an solution for resources not completely loading on phantomjs. I had a project where the script would stall/hang on a few resources. It was 50/50 if it would execute or not.

Some digging and I found the following page: https://github.com/ariya/phantomjs/issues/10652

Where the solution to set an timeout for resources was working out for me:

page.settings.resourceTimeout = 10000;

In regards to the above question I am not sure if this is completely appropiate but at least the information is easier to find now and can be regarded part of an solution to some.