Is there any XPath processor for SAX model?

I'm looking for an XPath evaluator that doesn't rebuild the whole DOM document to look for the nodes of a document: actually the object is to manage a large amount of XML data (ideally over 2Gb) with SAX model, which is very good for memory management, and give the possibility to search for nodes.

Thank you all for the support!

For all those who say it's not possible: I recently, after asked the question, found a project named "saxpath" (http://www.saxpath.org/), but I can't find any implementing project.

标签： java xml xpath sax

14条回答

▲ chillily

2楼-- · 2020-01-25 16:04

XPath DOES work with SAX, and most XSLT processors (especially Saxon and Apache Xalan) do support executing XPath expressions inside XSLTs on a SAX stream without building the entire dom.

They manage to do this, very roughly, as follows :

Examining the XPath expressions they need to match
Receiving SAX events and testing if that node is needed or will be needed by one of the XPath expressions.
Ignoring the SAX event if it is of no use for the XPath expressions.
Buffering it if it's needed

How they buffer it is also very interesting, cause while some simply create DOM fragments here and there, others use very optimized tables for quick lookup and reduced memory consumption.

How much they manage to optimize largely depends on the kind of XPath queries they find. As the already posted Saxon documentation clearly explain, queries that move "up" and then traverse "horizontally" (sibling by sibling) the document obviously requires the entire document to be there, but most of them require just a few nodes to be kept into RAM at any moment.

I'm pretty sure of this because when I was still making every day webapp using Cocoon, we had the XSLT memory footprint problem each time we used a "//something" expression inside an XSLT, and quite often we had to rework XPath expressions to allow a better SAX optimization.

0人赞添加讨论(0) 举报

贪生不怕死

3楼-- · 2020-01-25 16:05

Sorry for late answer, but I did implement a simple XPath expression path for SAX parsers. It only supports tag, attribute with optional value, and index due to SAX's forward nature. I made a delegate Handler for evaluating the given expression when the Handler implements ExpressionFilter. Though these classes are embedded into the project, it shouldn't be hard to extract.

More information

Examples - See classes with the HandlerHtml prefix

0人赞添加讨论(0) 举报

何必那么认真

4楼-- · 2020-01-25 16:08

We regularly parse 1GB+ complex XML files by using a SAX parser which extracts partial DOM trees that can be conveniently queried using XPath. I blogged about it here: http://softwareengineeringcorner.blogspot.com/2012/01/conveniently-processing-large-xml-files.html - Sources are available on github - MIT License.

0人赞添加讨论(0) 举报

家丑人穷心不美

5楼-- · 2020-01-25 16:10

I'll toss in a plug for a new project of mine, called AXS. It's at https://code.google.com/p/annotation-xpath-sax/ and the idea is that you annotate methods with (forward-axis-only) XPath statements and they get called when the SAX parser is at a node that matches it. So with a document

<doc>
<nodes>
  <node name="a">text of node 1</node>
  <node name="b">text of node 2</node>
  <node otherattr="I have attributes!">text of node 3</node>
</nodes>
</doc>

you can do things like

@XPath("/nodes/node")
void onNode(String nodeText)
{
  // will be called with "text of node [123]"
}

@XPathStart("//node[@name='']")
void onNode3(Attrs node3Attrs) { ... }

@XPathEnd("/nodes/node[2]")
void iDontCareAboutNode3() throws SAXExpression
{
  throw new StopParsingExpression();
}

Of course, the library is so new that I haven't even made a release of it yet, but it's MIT licensed, so feel free to give it a try and see if it matches your need. (I wrote it to do HTML screen-scraping with low enough memory requirements that I can run it on old Android devices...) If you find bugs, please let me know by filing them on the googlecode site!

0人赞添加讨论(0) 举报

欢心

6楼-- · 2020-01-25 16:12

SAX is forward-only, while XPath queries can navigate the document in any direction (consider parent::, ancestor::, preceding:: and preceding-sibling:: axis). I don't see how this would be possible in general. The best approximation would be some sort of lazy-loading DOM, but depending on your queries this may or may not give you any benefit - there is always a worst-case query such as //*[. != preceding::*].

0人赞添加讨论(0) 举报

Luminary・发光体

7楼-- · 2020-01-25 16:12

There are SAX/StAX based XPath implementations, but they only support a small subset of XPath expressions/axis largely due to SAX/StAX's forward only nature.. the best alternative I am aware of is extended VTD-XML, it supports full xpath, partial document loading via mem-map.. and a max document size of 256GB, but you will need 64-bit JVM to use it to its full potential

0人赞添加讨论(0) 举报

1 2 3 下一页

Is there any XPath processor for SAX model?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间