Getting Final HTML with Javascript rendered Java a

2019-01-11 12:15发布

问题:

I want to fetch data from an HTML page(scrape it). But it contains reviews in javascript. In normal java url fetch I am only getting the HTML(actual one) without Javascript executed. I want the final page with Javascript executed.

Example :- http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy-akki-051207.asp

This page has comments as a facebook plugin which are fetched as Javascript.

Also similar to this even on this. http://www.imdb.com/title/tt0848228/reviews

What should I do?

回答1:

Use phantomjs: http://phantomjs.org

var page = require('webpage').create();
page.open("http://www.glamsham.com/movies/reviews/rowdy-rathore-movie-review-cheers-for-rowdy-akki-051207.asp")
setTimeout(function(){
    // Where you want to save it    
    page.render("screenshoot.png")  
    // You can access its content using jQuery
    var fbcomments = page.evaluate(function(){
        return $(".fb-comments iframe").contents().find(".postContainer") 
    }) 
},10000)

You have to use the option in phantom --web-security=no to allow cross-domain interaction (ie for facebook iframe)

To communicate with other applications from phantomjs you can use a web server or make a POST request: https://github.com/ariya/phantomjs/blob/master/examples/post.js



回答2:

You can use HTML Unit, A java based "GUI LESS Browser". You can easily get the final rendered output of any page because this loads the page as a web browser do so and returns the final rendered output. You can disable this behaviour though.

UPDATE: You were asking for example? You don't have to do anything extra for doing that:

Example:

WebClient webClient = new WebClient();
HtmlPage myPage = ((HtmlPage) webClient.getPage(myUrl));

UPDATE 2: You can get iframe as follows:

HtmlPage myFrame = (HtmlPage) myPage.getFrameByName(myIframeName).getEnclosedPage();

Please read the documentation from above link. There is nothing you can't do about getting page content in HTMLUnit