-->

Parsing content which contains html tags using XML

2019-07-23 08:15发布

问题:

I am building an app in android using XmlPullParser.

How can I get the content from an html formatted like this?

<div class="content">
"Some text is here."
<br>
"some more text "<a class="link" href="adress">continues here</a>
<br>
</div>

I want to parse all the content like this:

"Some text is here. 
 some more text continues here"

"continues here" part should also be hyperlinked.

ADDITION after some comments: HTML is first put into Yahoo YQL and YQL generates an XML. I use the generated XML file in the code. Above mentioned part that i want to parse is from the generated XML.

回答1:

Both HTML and XML, although they share common syntax in some cases, are different. I think using a XmlPullParser for that purpose is not a good idea. I recommend using one of the several Java HTML parsers for that.



回答2:

XmlPullParser is meant to deal with XML. It's really rare to encounter XHMTL pages that are well structured on the web. An XML Parser would expect very well formatted data and is not supposed to be fault tolerant. On the other hand, HTML is usually loosely organized.

So, no, it's not a good idea. You should prefer other libraries like tagsoup or geronimo.

PS : and the best when you ask a stack over flow question is to try something by yourself and, if blocked, then ask. Not the other way around.