-->

Getting HtmlUnit to run under Android

2020-07-22 10:47发布

问题:

I was wondering if anyone was able to make HtmlUnit run under Android?

I have a site which I am scraping using Jsoup (this works well). However, one of the sections contains more than 2 pages. The site uses ASP.NET and they are using a Javascript postback for the link that leads to the next page. As a result I need to somehow execute that Javascript to get the next page's content. This is where my attempts at HtmlUnit comes in.

The following code worked perfectly on Java:

WebClient webClient = new WebClient();
webClient.setJavaScriptEnabled(true);
HtmlPage page = null;
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setThrowExceptionOnScriptError(false);

            try {
                page = webClient.getPage(URLOne.toString());
            } catch (FailingHttpStatusCodeException e1) {
                e1.printStackTrace();
            } catch (MalformedURLException e1) {
                e1.printStackTrace();
            } catch (IOException e1) {
                e1.printStackTrace();
            }

HtmlAnchor anchor = (HtmlAnchor) page.getAnchorByHref("javascript:__doPostBack('lb_next','')");

            try {
                page = (HtmlPage) anchor.click();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

webClient.closeAllWindows();

Document doc1 = Jsoup.parse(page.asXml());

When I setup the necessary libraries in Android I had to remove: xalan, xerces and xml-apis (HtmlUnit on Android). If I keep them I get the conversion to Dalvik error.

Without them the applications runs in Android, but when it comes to the section that requires HtmlUnit I get several of the following errors in logcat:

Could not find method org.apache.http.conn.scheme.Scheme.<init>, referenced from method com.gargoylesoftware.htmlunit.HttpWebConnection.createHttpClient
Could not find method org.w3c.dom.css.CSSStyleDeclaration.getLength, referenced from method com.gargoylesoftware.htmlunit.javascript.host.css.ComputedCSSStyleDeclaration.applyStyleFromSelector
VFY: unable to find class referenced in signature (Lorg/w3c/dom/css/CSSStyleSheet;
VFY: unable to find class referenced in signature (Lorg/w3c/dom/css/CSSStyleDeclaration;

Then the application force closes. This issue is similar to this: How do I get HtmlUnit to work under Android? and HtmlUnit Android problem with WebClient

The only reason I am using HtmlUnit is to be able to run the Javascript on that page. I am open to any alternative that may allow me to do something similar.

Thanks

回答1:

DO NOT use htmlUnit.

You would've thought that you would only need a couple of core jars. Nah, you might need all of them otherwise you might run into some class not found errors.

Just take a look at how many jars you have to load into Eclipse before you can run it! A total of 21 jars, over 10mb! Bear in mind that you can also package up to 50mb for Android Market. It just slows Eclipse down and you probably have to increase the memory when you debug.

Use Jsoup instead!