Is there any XPath processor for SAX model?

2020-01-25 15:42发布

I'm looking for an XPath evaluator that doesn't rebuild the whole DOM document to look for the nodes of a document: actually the object is to manage a large amount of XML data (ideally over 2Gb) with SAX model, which is very good for memory management, and give the possibility to search for nodes.

Thank you all for the support!

For all those who say it's not possible: I recently, after asked the question, found a project named "saxpath" (http://www.saxpath.org/), but I can't find any implementing project.

标签: java xml xpath sax
14条回答
老娘就宠你
2楼-- · 2020-01-25 16:17

Mmh I don't know if I really understand you. As far as I know, the SAX model is event oriented. That means, you do something if a certain node is encountered during the parsing. Yeah, it is better for memory but I don't see how you would like to get XPath into it. As SAX does not build a model, I don't think that this is possible.

查看更多
小情绪 Triste *
3楼-- · 2020-01-25 16:18

I don't think xpath works with SAX, but you might take a look at StAX which is an extended streaming XML API for Java.

http://en.wikipedia.org/wiki/StAX

查看更多
对你真心纯属浪费
4楼-- · 2020-01-25 16:20

Sorry, a slightly late answer here - it seems that this is possible for a subset of XPath - in general it's very difficult due to the fact that XPath can match both forwards and backwards from the "current" point. I'm aware of two projects that solve it to some degree using state machines: http://spex.sourceforge.net & http://www.cs.umd.edu/projects/xsq. I haven't looked at them in detail but they seem to use a similar approach.

查看更多
霸刀☆藐视天下
5楼-- · 2020-01-25 16:23

Did you have tried also QuiXPath https://code.google.com/p/quixpath/ ?

查看更多
Viruses.
6楼-- · 2020-01-25 16:23

What you could do is hook an XSL transformer to a SAX input source. Your processing will be sequential and the XSL preprocessor will make an attempt to catch the input as it comes to fiddle it into whatever result you specified. You can use this to pull a path's value out of the stream. This would come in especially handy if you wanted to produce a bunch of different XPATH results in one pass.

You'll get (typically) an XML document as a result, but you could pull your expected output out of, say, a StreamResult with not too much hassle.

查看更多
Bombasti
7楼-- · 2020-01-25 16:26

Have a look at the streaming mode of the Saxon-SA XSLT-processor.

http://www.saxonica.com/documentation/sourcedocs/serial.html

"The rules that determine whether a path expression can be streamed are:

  • The expression to be streamed starts with a call on the document() or doc() function.
  • The path expression introduced by the call on doc() or document must conform to a subset of XPath defined as follows:

  • any XPath expression is acceptable if it conforms to the rules for path expressions appearing in identity constraints in XML Schema. These rules allow no predicates; the first step (but only the first) can be introduced with "//"; the last step can optionally use the attribute axis; all other steps must be simple Axis Steps using the child axis.

  • In addition, Saxon allows the expression to contain a union, for example doc()/(*/ABC | /XYZ). Unions can also be expressed in abbreviated form, for example the above can be written as doc()//(ABC|XYZ).
  • The expression must either select elements only, or attributes only, or a mixture of elements and attributes.

  • Simple filters (one or more) are also supported. Each filter may apply to the last step or to the expression as a whole, and it must only use downward selection from the context node (the self, child, attribute, descendant, descendant-or-self, or namespace axes). It must not be positional (that is, it must not reference position() or last(), and must not be numeric: in fact, it must be such that Saxon can determine at compile time that it will not be numeric). Filters cannot be applied to unions or to branches of unions. Any violation of these conditions causes the expression to be evaluated without the streaming optimization.

  • These rules apply after other optimization rewrites have been applied to the expression. For example, some FLWOR expressions may be rewritten to a path expression that satisfies these rules.

  • The optimization is enabled only if explicitly requested, either by using the saxon:stream() extension function, or the saxon:read-once attribute on anXSLT xsl:copy-of instruction, or the XQuery pragma saxon:stream. It is available only if the stylesheet or query is processed using Saxon-SA."

Note: It is most likely in the commercial version this facility is available. I've used Saxon extensively earlier, and it is a nice piece of work.

查看更多
登录 后发表回答