-->

Is there some equivalent in Java to Ruby's Nok

2019-08-02 07:21发布

问题:

I have an issue where I need to prepend a DTD containing ENTITYs bracketed in the definition to an existing XML document.

For example, working from the specification for MathML in DAISY at http://www.daisy.org/projects/mathml/mathml-in-daisy-spec.html, say I am given this XML by an outside source:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dtbook PUBLIC "-//NISO//DTD dtbook 2005-2//EN"
 "http://www.daisy.org/z3986/2005/dtbook-2005-2.dtd">
<dtbook xmlns="http://www.daisy.org/z3986/2005/dtbook/" xmlns:m="http://www.w3.org/1998/Math/MathML"
    version="2005-3" xml:lang="eng">
    <m:math xmlns:dtbook="http://www.daisy.org/z3986/2005/dtbook/"
  id="math0001" dtbook:smilref="nativemathml.smil#math0001"
  altimg="nativemathml0001.png"
  alttext="sigma-summation UnderScript i equals zero OverScript infinity EndScripts x Subscript i">
      <m:mrow>
        <m:mstyle displaystyle='true'>
          <m:munderover>
            <m:mo>&#x2211;</m:mo>
            <m:mrow>
              <m:mi>i</m:mi><m:mo>=</m:mo><m:mn>0</m:mn>
            </m:mrow>
            <m:mi>&#x221E;</m:mi>
          </m:munderover>
          <m:mrow>
            <m:msub>
              <m:mi>x</m:mi>
              <m:mi>i</m:mi>
            </m:msub>
          </m:mrow>
        </m:mstyle>
      </m:mrow>
</m:math>
</dtbook>

I want to add the ENTITY definitions from the specification to make this book support MathML, so that the result looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dtbook PUBLIC "-//NISO//DTD dtbook 2005-2//EN"
 "http://www.daisy.org/z3986/2005/dtbook-2005-2.dtd"
 [
  <!ENTITY % MATHML.prefixed "INCLUDE" >
  <!ENTITY % MATHML.prefix "m">
  <!ENTITY % MATHML.Common.attrib
          "xlink:href    CDATA       #IMPLIED
          xlink:type     CDATA       #IMPLIED
          class          CDATA       #IMPLIED
          style          CDATA       #IMPLIED
          id             ID          #IMPLIED
          xref           IDREF       #IMPLIED
          other          CDATA       #IMPLIED
          xmlns:dtbook   CDATA       #FIXED 'http://www.daisy.org/z3986/2005/dtbook/'
          dtbook:smilref CDATA       #IMPLIED"
  >
  <!ENTITY % mathML2 PUBLIC "-//W3C//DTD MathML 2.0//EN"
             "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd"
  >
  %mathML2;
  <!ENTITY % externalFlow "| m:math">
  <!ENTITY % externalNamespaces "xmlns:m CDATA #FIXED
    'http://www.w3.org/1998/Math/MathML'">
 ]
>
<dtbook xmlns="http://www.daisy.org/z3986/2005/dtbook/" xmlns:m="http://www.w3.org/1998/Math/MathML"
    version="2005-3" xml:lang="eng">
    <m:math xmlns:dtbook="http://www.daisy.org/z3986/2005/dtbook/"
  id="math0001" dtbook:smilref="nativemathml.smil#math0001"
  altimg="nativemathml0001.png"
  alttext="sigma-summation UnderScript i equals zero OverScript infinity EndScripts x Subscript i">
      <m:mrow>
        <m:mstyle displaystyle='true'>
          <m:munderover>
            <m:mo>&#x2211;</m:mo>
            <m:mrow>
              <m:mi>i</m:mi><m:mo>=</m:mo><m:mn>0</m:mn>
            </m:mrow>
            <m:mi>&#x221E;</m:mi>
          </m:munderover>
          <m:mrow>
            <m:msub>
              <m:mi>x</m:mi>
              <m:mi>i</m:mi>
            </m:msub>
          </m:mrow>
        </m:mstyle>
      </m:mrow>
</m:math>
</dtbook>

In Ruby, there is a method in Nokogiri that can be used to add these ENTITY definitions that looks like this: Nokogiri::XML::EntityDecl.new("MATHML.prefixed", doc, MATHML_ENTITY_DECL_TYPE, nil, nil, "INCLUDE")

Is there an equivalent to this in Java? We are using JDOM to manipulate our XML documents, but the JDOM DocType class doesn't appear to support these entity definitions.

回答1:

With JDOM you should be able to parse the original document, and pull the DTDContent node from the document.

Your code would look something like:

Document doc = saxBuilder.build(myxmlfile);
DocType dtd = doc.getDocType();

That dtd content should be the reference to the dtbook reference.

You can now take the string-representation of the mathml declarations, and include them as an internalsubset of the DocType (perhaps you want to read it from a file, or as a System resource, or something).

String internal = "  <!ENTITY % MATHML.prefixed \"INCLUDE\" >\n"
    + "  <!ENTITY % MATHML.prefix \"m\">\n"
    + ......

dtd.setInternalSubset(internal);

See: http://www.jdom.org/docs/apidocs/org/jdom2/DocType.html#setInternalSubset(java.lang.String)

This will modify the declaration, and, if you output the XML, you should have the content you expect:

XMLOutputter xout = new XMLOutputter();
xout.output(doc, System.out);