How can I find all elements in an XML Schema whose

2019-08-23 04:59发布

问题:

Suppose that...

  • I have a complex XML schema, one that imports/includes other schema files, which in turn import/include even more schema files.
  • I want to find all the elements in this XML schema that have a value (i.e., text node) that is declared to be of type QName.
  • I want the location (path) of these elements to be expressed as XPath statements (e.g., /foo/bar).

If I'm writing a Java application, what's the right technology for this job? Is it a schema object model like XSOM? Is it the Java XPath API? Something else?

Edit: For those who want a jumpstart on accessing the SCM in Saxon (per Michael Kay's recommendation below), here's some Java code (sans exception handling):

// Load the XSD into Saxon
Processor processor = new Processor(true);
SchemaManager schemaManager = processor.getSchemaManager(); 
DocumentBuilder documentBuilder = processor.newDocumentBuilder();
SAXSource saxSource = new SAXSource(new InputSource("path/to/yourSchema.xsd"));
XdmNode schema = documentBuilder.build(saxSource);
schemaManager.load(saxSource);
// Export the SCM
XdmDestination destination = new XdmDestination();
schemaManager.exportComponents(destination);
XdmNode xdmNode = destination.getXdmNode();
System.out.println(xdmNode.toString());

回答1:

Querying schema documents is a difficult thing to get right, because in XSD there are so many ways of saying the same thing: for example named model groups and attribute groups complicate your task considerably.

If you're looking for types derived from QName as well as QName itself, then it really gets quite difficult.

Doing it on a "compiled" schema of some kind is therefore much easier than doing it on raw schema documents.

Using XSOM is one approach, though it doesn't have a query capability IIRC. Another approach is to use Saxon's SCM output: this is a representation of the compiled "schema component model" in XML form; being the compiled schema you don't have to worry about all the complexities of xs:include, xs:redefine, etc, while being XML means you can use XQuery on it. (I would recommend XQuery rather than XPath because there will be a lot of joins involved, including recursive joins for which you need user-defined functions.)