Parse HTML in Android

2018-12-31 08:39发布

I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.

Is there a way to parse HTML in Android?

5条回答
人间绝色
2楼-- · 2018-12-31 09:01

Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.

http://developer.android.com/reference/android/webkit/WebView.html

I think that you can enable javascript if you need it.

查看更多
梦该遗忘
3楼-- · 2018-12-31 09:06
String tmpHtml = "<html>a whole bunch of html stuff</html>";
String htmlTextStr = Html.fromHtml(tmpHtml).toString();
查看更多
无色无味的生活
4楼-- · 2018-12-31 09:11

I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
=> Good thing about it is that it will handle badly formed HTML

Here's a good example from their site.

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

//http://jsoup.org/cookbook/input/load-document-from-url
//Document doc = Jsoup.connect("http://example.com/").get();

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
  String linkHref = link.attr("href");
  String linkText = link.text();
}
查看更多
大哥的爱人
5楼-- · 2018-12-31 09:13

We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..

So Code goes like this

  private void getWebsite() {
    new Thread(new Runnable() {
      @Override
      public void run() {
        final StringBuilder builder = new StringBuilder();

        try {
          Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
          String title = doc.title();
          Elements links = doc.select("a[href]");

          builder.append(title).append("\n");

          for (Element link : links) {
            builder.append("\n").append("Link : ").append(link.attr("href"))
            .append("\n").append("Text : ").append(link.text());
          }
        } catch (IOException e) {
          builder.append("Error : ").append(e.getMessage()).append("\n");
        }

        runOnUiThread(new Runnable() {
          @Override
          public void run() {
            result.setText(builder.toString());
          }
        });
      }
    }).start();
  }

You just have to call the above function in onCreate Method of your MainActivity

I hope this one is also helpful for you guys.

Also read the original blog at Medium

查看更多
荒废的爱情
6楼-- · 2018-12-31 09:16

Have you tried using Html.fromHtml(source)?

I think that class is pretty liberal with respect to source quality (it uses TagSoup internally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.

查看更多
登录 后发表回答