Resolving type definitions from imported schema in

2019-03-12 12:58发布

I've got this API using JAXB to conveniently use object models, generated from XML Schemas by the XJC (XML-to-Java) compiler, through named references. It abstracts the creation of JAXB contexts and finding ObjectFactory methods away by all sorts of background magic and reflection. The basic gist of it is that you'd always define one general schema, and then any number (may also be 0) of schemas "extending" that general one, each resulting in its own data model. The general schema carries the reusable definitions, the ones extending it use those to compose their own models.

I've now run into the situation where I'd like to reuse the general schema for more than one project. The general type definitions should remain the same across projects, and some code will be built against the abstract classes generated from those. So I'd need to first generate classes for some generic schema, then generate those extending and using them separately. I'm using Maven for my build process.

The problem I'm running into is resolving type definitions from that generic schema in the extending schemas.

Suppose my generic schema is named "general.xsd" and looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.foobar.com/general"
xmlns:gen="http://www.foobar.com/general"
elementFormDefault="qualified" attributeFormDefault="qualified">

    <!-- Element (will usually be root) -->
    <xs:element name="transmission" type="gen:Transmission" />

    <!-- Definition -->
    <xs:complexType name="Transmission" abstract="true">
        <xs:sequence>
            <!-- Generic parts of a transmission would be in here... -->
        </xs:sequence>
    </xs:complexType>

</xs:schema>

Next to that there's a bindings file to do some naming customization and set the package name for the output:

<?xml version="1.0" encoding="UTF-8"?>
<bindings xmlns="http://java.sun.com/xml/ns/jaxb" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/jaxb http://java.sun.com/xml/ns/jaxb/bindingschema_2_0.xsd"
    version="2.1">

    <!-- Bindings for the general schema -->
    <bindings schemaLocation="general.xsd" node="/xs:schema">

        <schemaBindings>
            <package name="com.foobar.models.general"/>
        </schemaBindings>

        <bindings node="//xs:complexType[@name='Transmission']">
            <!-- Some customization of property names here... -->
        </bindings>

</bindings>

I'd then have the next bit in the POM of that project to generate the Java classes:

<plugin>
    <groupId>org.jvnet.jaxb2.maven2</groupId>
    <artifactId>maven-jaxb21-plugin</artifactId>
    <version>0.8.0</version>
    <executions>
        <execution>
            <id>xjc-generate</id>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <schemaDirectory>${basedir}/src/main/resources/com/foobar/schemas</schemaDirectory>
                <schemaLanguage>XMLSCHEMA</schemaLanguage>
                <addCompileSourceRoot>true</addCompileSourceRoot>
                <episode>true</episode>
                <removeOldOutput>true</removeOldOutput>
            </configuration>
        </execution>
    </executions>
</plugin>

As you can see, I'm using the JAXB2.1 Maven plugin. I've set the option to have an episode file generated for step-wise compilation. The option to remove previous output was for a bug workaround; all it does is make sure everything's cleaned up first so recompilation is forced.

So far so good. That project compiles without a hitch. It should be noted that apart from the generated Java classes, I also package the schemas into the resulting jar file. So those are available on the classpath! The sun-jaxb.episode file is in the META-INF, as it should be.

Then I start on the project that uses schemas which will extend the above, by first importing it. One of the "subtypes" could look like this (I'll call it sub.xsd):

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.foobar.com/sub"
xmlns:sub="http://www.foobar.com/sub"
xmlns:gen="http://www.foobar.com/general"
elementFormDefault="qualified" attributeFormDefault="qualified">

    <xs:import namespace="http://www.foobar.com/general" />

    <!-- Definition -->
    <xs:complexType name="SubTransmission">
        <xs:complexContent>
            <xs:extension base="gen:Transmission">
                <xs:sequence>
                    <!-- Additional elements placed here... -->
                </xs:sequence>
            </xs:extension>
        </xs:complexContent>
    </xs:complexType>

</xs:schema>

Again, there's a bindings file:

<?xml version="1.0" encoding="UTF-8"?>
<bindings xmlns="http://java.sun.com/xml/ns/jaxb" 
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/jaxb http://java.sun.com/xml/ns/jaxb/bindingschema_2_0.xsd"
    version="2.1">

    <!-- Bindings for sub type -->
    <bindings schemaLocation="sub.xsd" node="/xs:schema">

        <schemaBindings>
            <package name="com.foobar.models.sub"/>
        </schemaBindings>

    </bindings>

</bindings>

And here's the bit from the POM of this project that takes care of the XJC generation:

<plugin>
    <groupId>org.jvnet.jaxb2.maven2</groupId>
    <artifactId>maven-jaxb21-plugin</artifactId>
    <version>0.8.0</version>
    <executions>
        <execution>
            <id>xjc-generate</id>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <schemaDirectory>${basedir}/src/main/resources/com/foobar/schemas</schemaDirectory>
                <schemaLanguage>XMLSCHEMA</schemaLanguage>
                <addCompileSourceRoot>true</addCompileSourceRoot>
                <episode>false</episode>
                <catalog>${basedir}/src/main/resources/com/foobar/schemas/catalog.cat</catalog>
                <episodes>
                    <episode>
                        <groupId>com.foobar</groupId>
                        <artifactId>foobar-general-models</artifactId>
                        <version>1.0.0-SNAPSHOT</version>
                        <scope>compile</scope>
                    </episode>
                </episodes>
                <removeOldOutput>true</removeOldOutput>
            </configuration>
        </execution>
    </executions>
</plugin>

Originally, all the schemas were in a single folder and I had the schemaLocation attribute in the import set to general.xsd, which worked fine. But now that things are separated across projects, I run into problems. The first issue was that the other schema could not be found. I've resolved this by taking the schemaLocation attribute out of the <xs:import /> element, keep only the namespace attribute and adding a catalog file (catalog.cat) which you can see referenced in the above POM extract. Its contents are:

PUBLIC "http://www.foobar.com/general" "classpath:/com/foobar/schemas/general.xsd"

This seems to work, since I no longer get an error that states the schema cannot be found. But for some reason, resolving the actual type definitions from the imported schema continues to fail. Here's the exception:

Error while parsing schema(s).Location [ file:/C:/NetBeans_groups/Test/SubModelBundle/src/main/resources/com/foobar/schemas/sub.xsd{...,...}].
org.xml.sax.SAXParseException: src-resolve: Cannot resolve the name 'gen:Transmission' to a(n) 'type definition' component.

Here's what I tried so far:

  • Use a catalog file. Partially successful, since the imported schema can now be found.
  • Have the compilation for the general schema generate an episode file and use this for the compilation of the sub schema. Doesn't appear to make a difference, although this should only play a role once the type was resolved, so I don't think this is important yet.
  • Use a different JAXP (note: not JAXB, JAXP) implementation. It did use a different one, because I could see that in the exception's stack trace, but the end result is the same.
  • Use the maven-jaxb22-plugin instead of 21. No difference.

Looking around online, it seems people have been running into this issue since at least 2006 and it might be related to some Xerces resolver problems. I hope that this is not some bug that's been lurking around for 6 years without anyone caring to fix it. Does someone else have some suggestions? Maybe someone ran into the same problem and found a solution? The only workaround I can think of is to use 'svn:externals' to drag the general schema into the sub project and just regenerate the classes there, but it's dirty and will only work when you can connect to our svn repo.

Much thanks in advance for reading this long post. Do keep in mind that I've taken all of the above from existing projects and replaced some namespaces and other things for anonymity, so some typos are possible.

2条回答
聊天终结者
2楼-- · 2019-03-12 13:24

Using Maven 2.2.1 works for me using org.jvnet.jaxb2.maven2.resolver.tools.ClasspathCatalogResolver.

Here's a sample configuration:

<plugin>
    <groupId>org.jvnet.jaxb2.maven2</groupId>
    <artifactId>maven-jaxb2-plugin</artifactId>
    <version>0.8.0</version>
    <executions>
        <execution>
            <id>executionId</id>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>
                <schemaDirectory>src/main/resources/META-INF/schemas</schemaDirectory>
                <generatePackage>com.company.project.data</generatePackage>
                <bindingDirectory>src/main/jaxb</bindingDirectory>
                <catalog>src/main/jaxb/catalog.cat</catalog>
                <catalogResolver>org.jvnet.jaxb2.maven2.resolver.tools.ClasspathCatalogResolver</catalogResolver>
                <verbose>false</verbose>
                <extension>true</extension>
                <episodes>
                    <episode>
                        <groupId>com.company.project</groupId>
                        <artifactId>xsd-common-types</artifactId>
                        <version>${xsd-common-types.version}</version>
                    </episode>
                </episodes>
            </configuration>
        </execution>
    </executions>
    <dependencies>
        <dependency>
            <groupId>com.company.project</groupId>
            <artifactId>xsd-common-types</artifactId>
            <version>${xsd-common-types.version}</version>
        </dependency>
    </dependencies>
</plugin>

Making this configuration work with Maven 3 results in a org.xml.sax.SAXParseException

查看更多
乱世女痞
3楼-- · 2019-03-12 13:37

This answer was edited. Before, I had a solution using a custom catalog resolver. However, I've found the actual problem now. The explanation follows. For the TL;DR version that provides the solution, scroll to the bottom of this answer.


The problem is with the catalog file. Note how it had this line:

PUBLIC "http://www.foobar.com/general" "classpath:/com/foobar/schemas/general.xsd"

What does that say? It says that if the public ID http://www.foobar.com/general is encountered, the system ID for the schema is classpath:/com/foobar/schemas/general.xsd. So far so good. If we take the schemaLocation attribute out of our <xs:import /> elements, the only thing that remains is the public ID (namespace URN) and the catalog file tells us where to find the schema for it.

The problem occurs when that schema then uses <xs:include /> elements. They include schema files with the same target namespace. They specify a system ID (relative location). So you'd expect that to be used for resolution. However, logging the calls to the catalog resolver reveals that requests are made for resolution with both the public ID (namespace) and system ID (relative location). And that's where it goes wrong. The public ID is given preference because of the binding in the catalog file. And that leads us straight to the general.xsd file again.

Say for example that the general schema is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.foobar.com/general"
xmlns:gen="http://www.foobar.com/general"
elementFormDefault="qualified" attributeFormDefault="qualified">

    <!-- Including some definitions from another schema in the same location -->
    <xs:include schemaLocation="simple-types.xsd" />

    <!-- Remaining stuff... -->

</xs:schema>

And that a schema using that one is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.foobar.com/sub"
xmlns:sub="http://www.foobar.com/sub"
xmlns:gen="http://www.foobar.com/general"
elementFormDefault="qualified" attributeFormDefault="qualified">

    <xs:import namespace="http://www.foobar.com/general" />

    <!-- Remaining stuff... -->

</xs:schema>

When XJC is parsing that last schema, this is happening:

  1. Parsing local definitions.
  2. Encounters reference to definition from imported schema.
  3. Checks import, finds no system ID, only public ID (http://www.foobar.com/general).
  4. Checks catalog(s).
  5. Finds binding of public ID to classpath:/com/foobar/schemas/general.xsd.
  6. Parsing definitions in imported schema.
  7. Encounters reference to definition from included schema (simple-types.xsd).
  8. Checks include, finds system ID.
  9. Checks catalog(s) for the system ID, but the public ID is implicit.
  10. Finds binding of public ID to classpath:/com/foobar/schemas/general.xsd, which takes preference over system ID.
  11. Resolution of included schema definitions fails.

The details for the order in which resolution is attempted are described in the OASIS spec for XML catalogs: https://www.oasis-open.org/committees/entity/spec.html#s.ext.ent. It takes a bit of interpretation, but you'll find that if the preferred method of resolution is the public IDs, those will take precedence when bound in the catalog file even if there is a system ID.

The solution, then, is to specify that system IDs are the preferred method of resolution, not provide system IDs in the imports so that the catalog's public ID binding is used and relying on the relative system IDs from the includes. In the OASIS XML catalog format, you can use attribute prefer="system". In the OASIS TR9401 catalog format, you can use OVERRIDE no. Apparently the default is public/yes.

So my catalog file then becomes:

OVERRIDE no
PUBLIC "http://www.foobar.com/general" "classpath:/com/foobar/schemas/general.xsd"

Now the regular catalog resolver works fine. I no longer need the custom one. However, I wouldn't have guessed that the public ID is still used for resolution when including schemas and takes precedence over the system ID. I'd have thought the public ID would only be used for imports, and that the system ID would still be considered if resolution failed. Only adding some logging to the custom resolver revealed this.


The short answer: add OVERRIDE no as the first directive in your TR9401 catalog file, or attribute prefer="system" to an XML catalog file. Don't specify schemaLocation in <xs:import /> directives, but bind the namespace to the proper schema location in the catalog file. Make sure <xs:include /> uses a relative path to the included schema.

Another interesting thing: the catalog resolver used by XJC can handle not just classpath: URIs, but also maven: URIs, which work relative to a Maven artefact. Pretty useful if you're using that as your build tool. http://confluence.highsource.org/display/MJIIP/User+Guide#UserGuide-Usingcatalogs

查看更多
登录 后发表回答