Best practices for parsing XML

2019-01-18 16:43发布

My application shall parse XML received via HTTP. As far as I understand there are three major ways of parsing XML:

  • SAX
  • DOM
  • XmlPullParser

It is said that SAX is the fastest of these while DOM is not optimal for larger XML documents. But what is a large XML document in terms of parsing? What would be a recommended parser for the following?

  • XML document size between 1-5 kB
  • Easy traversing through the document, i.e. I need to know not only the current element but also the parent elements.

2条回答
贪生不怕死
2楼-- · 2019-01-18 17:31

As far as I understand there are three major ways of parsing XML:
- SAX
- DOM
- XmlPullParser

Wrong! Neither of those is the best way. What you really want is annotation based parsing using the Simple XML Framework. To see why follow this logic:

  1. Java works with objects.
  2. XML can be represented using Java objects. (see JAXB)
  3. Annotations could be used to map that XML to your Java objects and vice versa.
  4. The Simple XML Framework uses Annotations to allow you to map your Java and XML together.
  5. Simple XML is capable of running on Android (unlike JAXB).
  6. You should use Simple XML for all of your XML needs on Android.

And to help you do exactly that I will point you to my own blog post that explains exactly how to use the Simple library on Android.

Unless you have a 100MB XML file then Simple will be more than fast enough for you. It is for me, I use it on all of my Android XML projects.

N.B. I should point out that if you require the user to download XML files that are more than 1MB on Android then you may want to rethink your strategy. You might be doing it wrong.

查看更多
爷的心禁止访问
3楼-- · 2019-01-18 17:37

I'm afraid this is a case of, it depends ...

As a rule of thumb, using Java to build a DOM tree from an XML document will consume between 4 and 10 times that document's native size (assuming Western text and UTF-8 encoding), depending on the underlying implementation. So if speed and memory-use are not critical it will not be a problem for the small documents you mention.

DOM is generally regarded as quite an unpleasant way to work with XML. For background you might want to look at Elliotte Rusty Harold's presentation: What's Wrong with XML APIs (and how to fix them).

However, using SAX can be even more tedious as the document is processed one item at a time. SAX however is fast and consumes very little memory. If you can find a pull parser you like then by all means try that.

Another approach (not super-efficient, but clean and maintainable) is to build an in-memory tree of your XML (using DOM, say) and then use XPath expressions to select the information you are interested in.

查看更多
登录 后发表回答