JAXB convert non-ASCII characters to ASCII charact

2019-02-10 19:37发布

I have some xsd schemas that element names contains non-ASCII characters. When I generate java classes using Generate JAXB Classes command using Eclipse Kepler, generated classes and variables of them contains non-ASCII characters. I want to transform this non-ASCII characters to ASCII characters.

I already set locale at JAVA_TOOL_OPTIONS

-Duser.country=GB -Duser.language=en

For example

İ -> I
Ç -> C
Ş -> S
Ö -> O
Ğ -> G
Ü -> U
ı -> i
ö -> o
ü -> u
ç -> c
ğ -> g
ş -> s

1条回答
▲ chillily
2楼-- · 2019-02-10 20:30

EDIT: Since the requirement is of a generic solution and not using the external binding files, I have offered 2 options below:

Option 1 - A Generic Solution - Create a Custom XJC plugin to normalize

The generic solution is effectively:

  1. Extend com.sun.tools.xjc.Plugin abstract class and override methods that JAXB uses to name the artifacts - create a plugin bascially
  2. Pack this implementation in a jar after specifically calling out the name of the implementation within the services directory of the META-INF folder inside the jar
  3. Deploy this newly created jar along with jaxb libs and run it through ANT (build.xml provided below, read on)

For your purpose, I have created the plugin for which you can download the jar from here, download the ant script (build.xml) from here. Put the jar to your build path in eclipse and edit the ant file to provide your locations of your JAXB libs, target package of the generated classes, project name and schema location and run it. That's it!

Explanation:

I created a custom XJC plugin with an extra command line option -normalize to replace the accented characters in your created Java classes, methods, variables, properties and interfaces with their ASCII equivalents.

XJC has the capability of custom plugins creation to control the names, annotations and other attributes of the generated classes, variables and so on. This blog post though old can get you started with the basics of such plugin implementations.

Long story short, I created a class extending the abstract com.sun.tools.xjc.Plugin class, overriding its methods important one being onActivated.

In this method, I have set com.sun.tools.xjc.Option#setNameConverter to a custom class which takes care of overriding the required methods of acquiring names of the class, methods etc. I have committed the source to my git repo here as well, below is the detailed usage of it:

import java.text.Normalizer;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;

import com.sun.tools.xjc.BadCommandLineException;
import com.sun.tools.xjc.Options;
import com.sun.tools.xjc.Plugin;
import com.sun.tools.xjc.outline.Outline;
import com.sun.xml.bind.api.impl.NameConverter;

/**
 * {@link Plugin} that normalized the names of JAXB generated artifacts
 * 
 * @author popofibo
 */
public class NormalizeElements extends Plugin {

    /**
     * Set the command line option
     */
    @Override
    public String getOptionName() {
        return "normalize";
    }

    /**
     * Usage content of the option
     */
    @Override
    public String getUsage() {
        return "  -normalize    :  normalize the classes and method names generated by removing the accented characters";
    }

    /**
     * Set the name converted option to a delegated custom implementation of
     * NameConverter.Standard
     */
    @Override
    public void onActivated(Options opts) throws BadCommandLineException {
        opts.setNameConverter(new NonAsciiConverter(), this);
    }

    /**
     * Always return true
     */
    @Override
    public boolean run(Outline model, Options opt, ErrorHandler errorHandler)
            throws SAXException {
        return true;
    }

}

/**
 * 
 * @author popofibo
 * 
 */
class NonAsciiConverter extends NameConverter.Standard {

    /**
     * Override the generated class name
     */
    @Override
    public String toClassName(String s) {
        String origStr = super.toClassName(s);
        return normalize(origStr);
    }

    /**
     * Override the generated property name
     */
    @Override
    public String toPropertyName(String s) {
        String origStr = super.toPropertyName(s);
        return normalize(origStr);
    }

    /**
     * Override the generated variable name
     */
    @Override
    public String toVariableName(String s) {
        String origStr = super.toVariableName(s);
        return normalize(origStr);
    }

    /**
     * Override the generated interface name
     */
    @Override
    public String toInterfaceName(String s) {
        String origStr = super.toInterfaceName(s);
        return normalize(origStr);
    }

    /**
     * Match the accented characters within a String choosing Canonical
     * Decomposition option of the Normalizer, regex replaceAll using non POSIX
     * character classes for ASCII
     * 
     * @param accented
     * @return normalized String
     */
    private String normalize(String accented) {
        String normalized = Normalizer.normalize(accented, Normalizer.Form.NFD);
        normalized = normalized.replaceAll("[^\\p{ASCII}]", "");
        return normalized;
    }
}

To enable this plugin with the normal jaxb unmarshalling is to pack these class in a jar, add /META-INF/services/com.sun.tools.xjc.Plugin file within the jar and put it in your build path.

/META-INF/services/com.sun.tools.xjc.Plugin file within the jar:

enter image description here

This file reads:

com.popofibo.plugins.jaxb.NormalizeElements

As mentioned before, I pack it in a jar, deploy it in my eclipse build path, now the problem I ran in to with running eclipse kepler with JDK 1.7 is I get this exception (message):

com.sun.tools.xjc.plugin Provider <my class> not a subtype

Hence, it's better to generate the classes using ANT, the following build.xml does justice to the work done so far:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project name="SomeProject" default="createClasses">

    <taskdef name="xjc" classname="com.sun.tools.xjc.XJC2Task">
        <classpath>
            <pathelement
                path="C:/Workspace/jaxb-ri-2.2.7/jaxb-ri-2.2.7/lib/jaxb-xjc.jar" />
            <pathelement
                path="C:/Workspace/jaxb-ri-2.2.7/jaxb-ri-2.2.7/lib/jaxb-impl.jar" />
            <pathelement
                path="C:/Workspace/jaxb-ri-2.2.7/jaxb-ri-2.2.7/lib/jaxb2-value-constructor.jar" />
            <pathelement path="C:/Workspace/normalizeplugin_xjc_v0.4.jar" />
        </classpath>
    </taskdef>

    <target name="clean">
        <delete dir="src/com/popofibo/jaxb" />
    </target>

    <target name="createClasses" depends="clean">
        <xjc schema="res/some.xsd" destdir="src" package="com.popofibo.jaxb"
            encoding="UTF-8">
            <arg value="-normalize" />
        </xjc>
    </target>
</project>

The schema to showcase this normalization process I chose was:

<xs:element name="shiporder">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="Örderperson" type="xs:string"/>
      <xs:element name="Şhİpto">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="name" type="xs:string"/>
            <xs:element name="address" type="xs:string"/>
            <xs:element name="Çity" type="xs:string"/>
            <xs:element name="ÇoÜntry" type="xs:string"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
      <xs:element name="İtem" maxOccurs="unbounded">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="title" type="xs:string"/>
            <xs:element name="note" type="xs:string" minOccurs="0"/>
            <xs:element name="qÜantity" type="xs:positiveInteger"/>
            <xs:element name="price" type="xs:decimal"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:sequence>
    <xs:attribute name="orderid" type="xs:string" use="required"/>
  </xs:complexType>
</xs:element>

</xs:schema> 

As you can see, I have set the argument and package as to where I want to have my classes generated, and voila - the ASCII names for classes, methods, variables in the generated artifacts (the only gap I see is with the XML annotations which would not affect the cause but also easy to overcome):

enter image description here

The above screenshot shows the names were normalized and are replaced by their ASCII counterparts (to check how it would look without the replacement, please refer to the screenshots in option 2).

Option 2 - Using External binding file

To remove accented characters, you can create a custom binding file and use it to bind your class and property names while generating your classes. Refer to: Creating an External Binding Declarations File Using JAXB Binding Declarations

I took the xsd already mentioned in Option 1 with element names containing "accented" (Non-ASCII) characters:

If I generate the classes without specifying the external binding, I get the following outputs:

enter image description here!

Now if I change the binding a bit to generate class names and variables of my choice, I write my binding.xml as:

<jxb:bindings xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:jxb="http://java.sun.com/xml/ns/jaxb" version="2.1">
    <jxb:globalBindings localScoping="toplevel" />

    <jxb:bindings schemaLocation="some.xsd">
        <jxb:bindings node="//xs:element[@name='Şhİpto']">
            <jxb:class name="ShipTo" />
        </jxb:bindings>
        <jxb:bindings node="//xs:element[@name='Örderperson']">
            <jxb:property name="OrderPerson" />
        </jxb:bindings>
        <jxb:bindings node="//xs:element[@name='Şhİpto']//xs:complexType">
            <jxb:class name="ShipToo" />
        </jxb:bindings>
    </jxb:bindings>

</jxb:bindings>

Now when I generate my class through eclipse by specifying the binding file:

enter image description here

In the next steps, I choose the package and the binding file I get,

enter image description here

Note: If you are not using eclipse to generate your classes, you might want to check xjc binding compiler out to utilize your external binding file.

查看更多
登录 后发表回答