How to deserialize Java objects from XML?

2019-02-20 05:33发布

I'm sure this might have been discussed at length or answered before, however I need a bit more information on the best approach for my situation...

Problem:
We have some large XML data (anywhere from 100k to 5MB) which we need to inflate into Java objects. The issue is that the data doesn't really doesn't map onto an object very well at all, so we need to only pull certain parts of the data out and create the objects. Given that, solutions such as JAXB or XStream really aren't appropriate.

So, we need to pull XML data out and get it into java objects as efficiently as possible.


Possible Solutions:
The way I see it, we have 3 possible solutions:

  • SAX parsing
  • DOM parsing
  • XSLT

We can load the XML into any JAXP implementation and pull the data out using one of the above methods.


Question(s)
I have a few questions/concerns:

  • How does XSLT work under the hood? Is it just a DOM parser? I ask because XSLT seems like a good way to go, but I don't really want to consider it if it won't give us better performance than DOM.
  • What are some popular libraries that provide DOM, XSLT, and SAX XML parsers?
  • In your experience, what are the reasons for picking DOM, SAX, or XSLT? Does the ease of use of DOM or XSLT totally dominate the performance improvements SAX offers?
  • Any benchmarks out there? The ones I've found are old (as in, 8 years old). So some recent benchmarks would be appreciated.
  • Are there any other solutions besides those outlined above that I could be missing?


Edit:
A few clarifications... You can use XSLT to directly inject values into a Java object... it is normally used to transform XML into some other XML, however I'm talking from the standpoint of calling a method from XSLT into java to inject the value.

I'm still not clear on how an XSLT processor works exactly... How is it feeding the XML into the XSLT code you write?

5条回答
疯言疯语
2楼-- · 2019-02-20 05:58

You can use the @XmlPath extension in EclipseLink JAXB (MOXy) to easily handle this use case. For a detailed example see:

Sample Code:

package blog.geocode;

import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

import org.eclipse.persistence.oxm.annotations.XmlPath;

@XmlRootElement(name="kml")
@XmlType(propOrder={"country", "state", "city", "street", "postalCode"})
public class Address {

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:Thoroughfare/ns:ThoroughfareName/text()")
    private String street;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:LocalityName/text()")
    private String city;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:AdministrativeAreaName/text()")
    private String state;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryNameCode/text()")
    private String country;

    @XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:PostalCode/ns:PostalCodeNumber/text()")
    private String postalCode;

}
查看更多
叛逆
3楼-- · 2019-02-20 06:05

DOM, SAX and XSLT are different animals.

DOM parsing loads the entire document into memory, which for 100K to 5MB (very small by today's standards) would work.

SAX is a stream parser which reads the XML and delivers events to your code for each tag.

XSLT is a system for transforming one XML tree into another. Even if you wrote a transform that converts the input to a more suitable format, you'd still have to write something using DOM or SAX to convert it into Java objects.

查看更多
你好瞎i
4楼-- · 2019-02-20 06:18

JAXB, the Java API for XML Binding might be what you want. You use it to inflate an XML document into a Java object graph made up of "Java content objects". These content objects are instances of classes generated by JAXB to match the XML document's schema

But if you already have a set of Java classes, or don't yet have a schema for the document, JAXB probably isn't the best way to go. I'd suggest doing a SAX parse and then building up your Java objects during the parse. Alternatively you could try a DOM parse and then walk the resulting Document tree to pull out the parts of interest (maybe with XPath) -- but 5MB of XML might turn into 50MB of DOM tree objects in Java.

查看更多
别忘想泡老子
5楼-- · 2019-02-20 06:21

Use XSLT to transform the large XML files into a local domain model that is mapped to java objets with JAXB.

Start with the JDK 5+ built in XML libraries (unless you absolutely need XSLT 2.0, in which case use Saxon)

Don't focus on relative performance of SAX/DOM, focus on learning how to write XPath expressions and use XSLT, and then worry about performance later if and only if you find it to be a problem.

The Eclipse XML editors are decent, but if you can afford it, spring for Oxygen XML, which will let you do XPath evaluation in realtime.

查看更多
甜甜的少女心
6楼-- · 2019-02-20 06:21

We had a similar situation and I just threw together some XPath code that parsed the stuff I needed.

It was amazingly quick even on 100k+ XML files. We went as low tech as possible. We handle around 1000 files a day of that size and parsing time is very low. We have no memory issues, leaks etc.

We wrote a quick prototype in Groovy (if my memory is accurate) - proof of concept took me about 10 minutes

查看更多
登录 后发表回答