Using Phantom.js evaluate, how can I get the HTML

2019-02-19 18:17发布

问题:

page.evaluate(function() { return document; }, function(result){    
    console.log(result)                    
    next();
});

result is actually a huge object. I don't know the properties and attributes of that object. I just want the HTML of the page as you would see it in Chrome inspector.

From the look of the object, it seems that the HTML includes CSS and javascript..which is weird. The user should not see the CSS and javascript, because they are not the web page's HTML. Those are external files. I only want the HTML that the user would see.

回答1:

The type of document is an HTML document. To get the entire DOM as a string, you could do document.documentElement.outerHTML.

From outside evaluate, you can use page.content. It is a string.

I don't know what you mean by "HTML includes CSS and JavaScript" or "the web page's HTML". Are you referring to the difference between the page source and the DOM as modified by scripting? Both the above give you the current DOM, not the original page source.