How efficient is XPath compared to using DOM in Do

2019-02-17 22:43发布

问题:

For example consider the following xml

<root>
  <childNode attribute1="value1">
     <grandChildNode attrib1="val1" attrib2="val2">some content1
     </grandChildNode>
     <grandChildNode attrib1="val1" attrib2="val2">some content2
     </grandChildNode>
     <grandChildNode attrib1="val1" attrib2="val2">some content3
     </grandChildNode>
  </childNode>
  <childNode attribute1="value1">
     <grandChildNode attrib1="val1" attrib2="val2">some content1
     </grandChildNode>
     <grandChildNode attrib1="val1" attrib2="val2">some content2
     </grandChildNode>
     <grandChildNode attrib1="val1" attrib2="val2">some content3
     </grandChildNode>
  </childNode>
  <childNode attribute1="value1">
     <grandChildNode attrib1="val1" attrib2="val2">some content1
     </grandChildNode>
     <grandChildNode attrib1="val1" attrib2="val2">some content2
     </grandChildNode>
     <grandChildNode attrib1="val1" attrib2="val2">some content3
     </grandChildNode>
  </childNode>
</root>

Would using DOM to get the root node, then cycle through the childNode and grandChildNode be efficient or using XPath expressions to gather the details of the child and grandChild nodes be efficient?

回答1:

If you want to process an XML document in its entirety, parsing XML into a DOM will almost always be the least efficient in terms of deserialisation time, CPU usage and memory usage.

Parsing to a DOM requires around 10-15 times the amount of memory as the XML document requires disk space. For example, a 1 megabyte XML document will parse into a DOM taking up 10-15 megabytes of memory.

Only ever parse into a DOM if you intend to modify some or all of the data and then put the result back into an XML document. For all other use cases, DOM is a poor choice.

XPath is often significantly less resource heavy, but this does depend on the length of the document (i.e. how many 'childNode' elements you have) and the location in the document of the data in which you are interested.

XPath memory usage and completion time tends to increase the further down the document you go. For example, let's say you have an XML document with 20,000 childNode elements, each childNode has a unique identifier that you know in advance, and you want to extract a known childNode from the document. Extracting the 18,345th childNode would use much, much, much more memory than extracting the 3rd.

So if you are using XPath to extract all childNode elements, you may find it less efficient than parsing into a DOM. XPath is generally an easy way of extracting a portion of an XML doucment. I'd not recommend using it for processing all of an XML document.

By far the best approach, if you are indeed looking to extract and process all data in an XML document, would be to use a SAX-based reader. This will be both orders of magnitude faster and less resource heavy than any other approach.

That said, it does also depend on the volume of data you are dealing with. For the example XML document you gave, you won't notice any practical difference. Yes, DOM will be 'slow' and SAX will be 'fast', but we're talking milli- or micro-second differences.

SAX can easily be hundreds or thousands of times faster than DOM, however if that's the difference between 2 microseconds and 2 milliseconds you're not going to notice. When you're dealing with a document containing 20,000 childNode elements, 2 seconds versus 200 seconds will become more of a problem.