Java - XML parser performance : Sun Java Streaming

2019-04-08 13:23发布

问题:

I am looking for latest, memory efficient and high-performance java XML parsing API. I need to parse 3 MB to 5 MB XML files.

I did google on this and come to know about Sun Java Streaming XML Parser (SJSXP) and Woodstox is much faster than DOM & SAX. Both are using StAX API. *schema validation is not supported by these technologies.

Aalto XML processor is also implements StAX API.

I have not found concrete findings on performance on these technologies.

Which one will be best in context of memory efficient, high-performance and ease of use ?

回答1:

Here are some more links that might be relevant:

  • Stax impls for data-binding: http://technotes.blogs.sapo.pt/1708.html
  • Using Woodstox efficiently: http://www.cowtowncoder.com/blog/archives/2006/06/entry_2.html
  • Speeding up XSLT with Woodstox: http://www.cowtowncoder.com/blog/archives/2009/04/entry_235.html

As to performance: SJSXP is the slowest; it's just a repackage internals of Xerces, wrapped in Stax API. This has some negative effects on performance (since it's not really designed for pull parsing). Woodstox is bit faster; much faster for small documents and writing, less difference when parsing longer documents.

And Aalto is by far fastest of the three, especially for parsing. It is commonly 50% - 100% faster than either Woodstox or SJSXP. One downside is that it does not handle DTDs (and thereby not external entities; it handles pre-defined and character entities).

Disclaimer: I am author of Woodstox and Aalto; as well as contributor to SJSXP (bug fixes)



回答2:

Some helpful links for above queries :

http://www.developerfusion.com/article/84523/stax-the-odds-with-woodstox/ (June 2010)

http://www.ibm.com/developerworks/opensource/library/os-ag-renegade15/ (July 2007)

Performance benchmarking detail :

http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html (May 2007)