java html parser for reading javascript generated

2019-05-23 16:42发布

I am using jsoup for reading a web page by the following function.

public Document getDocuement(String url){
        Document doc = null;
        try {
            doc = Jsoup.connect(url).timeout(20*1000).userAgent("Mozilla").get();
        } catch (Exception e) {
            return null;
        }
        return doc;
    }

But whenever i am trying to read a web page that contain javascript generated contents, jsoup does not read those contents. ie, the actual content of the page is loading by some javascript calls.So it is not present in the page source of that link. For example, this blog: http://blog.rapporter.net/search/label/r. Is there a way to get also javascript generated content when parsing page with Jsoup? If no please suggest any java html parser that can solve this problem..

1条回答
We Are One
2楼-- · 2019-05-23 17:29

You cannot do this with Jsoup. Jsoup parses HTML, to wait for AJAX requests or JavaScript content in general you would need a browser which could execute this JavaScript in order to get some output from it. JavaScript logic can be complex, so executing JavaScript and loading content is not a trivial thing (just take a look at how complicated browsers, JS and the DOM are).

查看更多
登录 后发表回答