Get javascript rendered html source using phantomj

2019-01-22 04:09发布

问题:

First of all, I am not looking for any help in development or testing environment. Also I am new to phantomjs and all I want is just the command line operation of phantomjs on linux terminal.

I have an html page whose body is rendered by some javascript code. What I need is I wanted to download that rendered html content using phantomjs.

I don't have any idea using phantomjs. I have a bit of experience in shell scripting. So I have tried to do this with curl. But as curl is not sufficient to render javascript, I was able to get the html of the default source code only. The rendered contents weren't downloaded. I heard that ruby mechanize may do this job. But I have no knowledge about ruby. So on further investigation I found the command line tool phantomjs. How can I do this with phantomjs?

Please feel free to ask what all additional information do I need to provide.

回答1:

Unfortunately, that is not possible using just the PhantomJS command line. You have to use a Javascript file to actually accomplish anything with PhantomJS.

Here is a very simple version of the script you can use

Code mostly copied from https://stackoverflow.com/a/12469284/4499924

printSource.js

var system = require('system');
var page   = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url    = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
  // page.content is the source
  console.log(page.content);
  // need to call phantom.exit() to prevent from hanging
  phantom.exit();
});

To print the page source to standard out.

phantomjs printSource.js http://todomvc.com/examples/emberjs/

To save the page source in a file

phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html