Unable to load page resources with PhantomJS

2019-03-21 07:27发布

I'm using PhantomJS to get page content for given URL. The problem is that on some pages PhantomJS can not load some resources (js, css...), and the error I'm getting is:

error code 5, Operation canceled

Web page on which I can reproduce this problem is www.lifehacker.com The resources I can not get are:

The command I'm running is:

phantomjs --debug=true --cookies-file=cookies.txt --ignore-ssl-errors=true --ssl-protocol=tlsv1 fetchpage.js http://www.lifehacker.com

and even if I remove options like cookies-file, ignore-ssl-errors, ssl-protocol the result is still the same.

The fetchpage.js script is:

var webPage = require('webpage');
var system = require('system');
var page = webPage.create();

if (system.args.length === 1) {
  console.log('Usage: fetchpage.js <some URL>');
  phantom.exit(1);
}

var url = system.args[1];

page.open(url, function (status) {

  console.log("STATUS: " + status);

  if (status !== 'success') {
    console.log(
      "Error opening url \"" + page.reason_url
      + "\": " + page.reason
      + "\": " + page
    );
    phantom.exit(1);
  } else {
    var content = page.content;
    console.log(content);
    phantom.exit(1);
  }
});

If I open that same page in Chrome, page loads just fine. Also if I copy those resource URLs that phantomjs can not load and paste them in Chrome, they load just fine.

I have tried to google for similar problems, but I only found some suggestions about setting timeout which did not work for me.

I have tried the same thing with phantomjs v1.9.0, 1.9.8 and 2.0.1-development.

What's even more interesting, sometimes phantomjs script manages to get full response from all resources, so I'm suspecting on cache, but I couldn't force server to avoid cache. I have tried to send custom headers through phantomjs like this:

...
var page = webPage.create();
page.customHeaders = {
    "Cache-Control":"no-cache",
    "Pragma":"no-cache"
};
page.open(url, function (status) {
  ...

but nothing changed.

I am running out of ideas..

1条回答
来,给爷笑一个
2楼-- · 2019-03-21 08:28

For coders who come across this page during their quest to find an solution for resources not completely loading on phantomjs. I had a project where the script would stall/hang on a few resources. It was 50/50 if it would execute or not.

Some digging and I found the following page: https://github.com/ariya/phantomjs/issues/10652

Where the solution to set an timeout for resources was working out for me:

page.settings.resourceTimeout = 10000;

In regards to the above question I am not sure if this is completely appropiate but at least the information is easier to find now and can be regarded part of an solution to some.

查看更多
登录 后发表回答