-->

HTML parsing in Android

2019-02-20 19:03发布

问题:

I am trying to learn how to parse HTML, but as I don't have a lot of experience in either Java or Android, it's a little complicated. I have read the IBM XML parsing tutorial and have learned to parse an RSS feed. My problem is: I would like to get data from an HTML site. I have read some information on HTML cleaner, JSON, etc., but I can't find a good tutorial to help me. Do you have any tutorials that might be helpful?

Thanks.

回答1:

Check out the following HTML parsers. There are more out there. Maybe one will work for you:

  • HTMLCleaner: http://htmlcleaner.sourceforge.net/

  • TagSoup: http://ccil.org/~cowan/XML/tagsoup/

  • Jericho: http://jericho.htmlparser.net/docs/index.html



回答2:

IMO there are two easy ways to parse HTML:

  • Convert the HML to XML (XHTML) using a library (e.g. HTMLTidy) and then use an XML parser
  • Use an existing HTML parser (e.g. a standard Web browser like WebKit, ForeFox, and/or IE) and then read the "DOM" which is a more-or-less-API-friendly representation of the parsed HTML

Alternatively, if you want to write your own parser (which I doubt you should, for homework: it would be long and complicated to implement it properly/completely), see the specs for parsing HTML.