Skip particular Javascript execution in HTML unit

2020-06-18 10:36发布

问题:

I have a URL. I want to fetch Page-Source of the URL after executing Java Scripts.

Fetch Page source using HtmlUnit : URL got stuck

Initially I suspected that it is due to system resource and High CPU usage, that the URL is getting stuck.

Then I tried to run it on HTML UNIT 2.9 and 2.11. It got stuck on both while parsing. Refer the above question for HTML UNIT code scrape that is getting stuck.

Now I am suspecting that this might be due to JS Execution going into infinite loop.

I want to check what JS files are causing problem and remove them from execution.

If they are JS for sites like google analytics, twitter etc, I may not need them at all.

So I want to find a way to tell HTML Unit to ignore certain JS file and execute the rest.

Does anybody know how to do that ?

回答1:

Try this. It worked for me:

class InterceptWebConnection extends FalsifyingWebConnection{
    public InterceptWebConnection(WebClient webClient) throws IllegalArgumentException{
        super(webClient);
    }
    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        WebResponse response=super.getResponse(request);
        if(response.getWebRequest().getUrl().toString().endsWith("dom-drag.js")){
            return createWebResponse(response.getWebRequest(), "", "application/javascript", 200, "Ok");
        }
        return super.getResponse(request);
    }
}

then write following while setting up your webClient

new InterceptWebConnection(webClient);


标签: htmlunit