在使用错误的HtmlUnit(Error while using HtmlUnit)

2019-09-18 16:36发布

当我执行这个简单的代码来获得在网站文字内容,它显示了我无法理解的错误。

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.ScriptException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class sd {
    public static void main(String[] args) {
        sd vip=new sd();
        try {
            vip.homePage();
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.print("sssss");
    }

    public void homePage() throws Exception, ScriptException {
        final WebClient webClient = new WebClient();
        final HtmlPage page =       
    (HtmlPage)webClient.getPage("http://timesofindia.indiatimes.com/");
        String pageAsText = page.asText();
        String pageAsXML = page.asXml();

        // System.out.println(pageAsXML);
        System.out.println("////////////////////output//////////////////////////"); 
        System.out.println(pageAsText);
        // System.out.println(pageAsXML);
        System.out.println("////////////////////output ends//////////////////////////"); 
    }

}

错误,我得到:

   ======= EXCEPTION START ========
Exception class=[com.gargoylesoftware.htmlunit.ScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)
Caused by: java.lang.RuntimeException: Exception invoking jsxFunction_write
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)

Answer 1:

设置你的Web客户端,以不丢的JavaScript异常

webClient.setThrowExceptionOnScriptError(假);

如果没有enougth,初始化您的Web客户端设置时,FF为客户端行为。

Web客户端=新Web客户端(BrowserVersion.FIREFOX_3_6); Web客户端=新Web客户端(BrowserVersion.FIREFOX_10); //根据版本的HtmlUnit



Answer 2:

WebClient::setThrowExceptionOnScriptError自化的HtmlUnit版本2.11方法已经过时了。 使用新版本中的以下内容:

webClient.getOptions().setThrowExceptionOnScriptError(false);


Answer 3:

Even I had this error. This option of setting WebClient to suppress errors works for basic websites. But as the website becomes complex, it literally fails

After multiple trials, I finally had to choose Phantomjs. It is written in C++. I had to write some scripts and then execute it using phantomjs. The script would load the url and write the data to a file.

Once that file is ready, I would write a java program to load the file data and then do my operations on that file. For loading and scraping through the data, I had used Jsoup.

As you can see, HtmlUnit, Jaunt, Jsoup support full HTML, CSS. What they are missing is that they do not support Javascript completely. That is the main reason of errors such as Exceptions thrown, complete page not getting loaded and so on..



文章来源: Error while using HtmlUnit