Is there some equivalent in Java to Ruby's Nok

2019-08-02 07:24发布

I have an issue where I need to prepend a DTD containing ENTITYs bracketed in the definition to an existing XML document.

For example, working from the specification for MathML in DAISY at http://www.daisy.org/projects/mathml/mathml-in-daisy-spec.html, say I am given this XML by an outside source:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dtbook PUBLIC "-//NISO//DTD dtbook 2005-2//EN"
 "http://www.daisy.org/z3986/2005/dtbook-2005-2.dtd">
<dtbook xmlns="http://www.daisy.org/z3986/2005/dtbook/" xmlns:m="http://www.w3.org/1998/Math/MathML"
    version="2005-3" xml:lang="eng">
    <m:math xmlns:dtbook="http://www.daisy.org/z3986/2005/dtbook/"
  id="math0001" dtbook:smilref="nativemathml.smil#math0001"
  altimg="nativemathml0001.png"
  alttext="sigma-summation UnderScript i equals zero OverScript infinity EndScripts x Subscript i">
      <m:mrow>
        <m:mstyle displaystyle='true'>
          <m:munderover>
            <m:mo>&#x2211;</m:mo>
            <m:mrow>
              <m:mi>i</m:mi><m:mo>=</m:mo><m:mn>0</m:mn>
            </m:mrow>
            <m:mi>&#x221E;</m:mi>
          </m:munderover>
          <m:mrow>
            <m:msub>
              <m:mi>x</m:mi>
              <m:mi>i</m:mi>
            </m:msub>
          </m:mrow>
        </m:mstyle>
      </m:mrow>
</m:math>
</dtbook>

I want to add the ENTITY definitions from the specification to make this book support MathML, so that the result looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE dtbook PUBLIC "-//NISO//DTD dtbook 2005-2//EN"
 "http://www.daisy.org/z3986/2005/dtbook-2005-2.dtd"
 [
  <!ENTITY % MATHML.prefixed "INCLUDE" >
  <!ENTITY % MATHML.prefix "m">
  <!ENTITY % MATHML.Common.attrib
          "xlink:href    CDATA       #IMPLIED
          xlink:type     CDATA       #IMPLIED
          class          CDATA       #IMPLIED
          style          CDATA       #IMPLIED
          id             ID          #IMPLIED
          xref           IDREF       #IMPLIED
          other          CDATA       #IMPLIED
          xmlns:dtbook   CDATA       #FIXED 'http://www.daisy.org/z3986/2005/dtbook/'
          dtbook:smilref CDATA       #IMPLIED"
  >
  <!ENTITY % mathML2 PUBLIC "-//W3C//DTD MathML 2.0//EN"
             "http://www.w3.org/Math/DTD/mathml2/mathml2.dtd"
  >
  %mathML2;
  <!ENTITY % externalFlow "| m:math">
  <!ENTITY % externalNamespaces "xmlns:m CDATA #FIXED
    'http://www.w3.org/1998/Math/MathML'">
 ]
>
<dtbook xmlns="http://www.daisy.org/z3986/2005/dtbook/" xmlns:m="http://www.w3.org/1998/Math/MathML"
    version="2005-3" xml:lang="eng">
    <m:math xmlns:dtbook="http://www.daisy.org/z3986/2005/dtbook/"
  id="math0001" dtbook:smilref="nativemathml.smil#math0001"
  altimg="nativemathml0001.png"
  alttext="sigma-summation UnderScript i equals zero OverScript infinity EndScripts x Subscript i">
      <m:mrow>
        <m:mstyle displaystyle='true'>
          <m:munderover>
            <m:mo>&#x2211;</m:mo>
            <m:mrow>
              <m:mi>i</m:mi><m:mo>=</m:mo><m:mn>0</m:mn>
            </m:mrow>
            <m:mi>&#x221E;</m:mi>
          </m:munderover>
          <m:mrow>
            <m:msub>
              <m:mi>x</m:mi>
              <m:mi>i</m:mi>
            </m:msub>
          </m:mrow>
        </m:mstyle>
      </m:mrow>
</m:math>
</dtbook>

In Ruby, there is a method in Nokogiri that can be used to add these ENTITY definitions that looks like this: Nokogiri::XML::EntityDecl.new("MATHML.prefixed", doc, MATHML_ENTITY_DECL_TYPE, nil, nil, "INCLUDE")

Is there an equivalent to this in Java? We are using JDOM to manipulate our XML documents, but the JDOM DocType class doesn't appear to support these entity definitions.

1条回答
欢心
2楼-- · 2019-08-02 08:03

With JDOM you should be able to parse the original document, and pull the DTDContent node from the document.

Your code would look something like:

Document doc = saxBuilder.build(myxmlfile);
DocType dtd = doc.getDocType();

That dtd content should be the reference to the dtbook reference.

You can now take the string-representation of the mathml declarations, and include them as an internalsubset of the DocType (perhaps you want to read it from a file, or as a System resource, or something).

String internal = "  <!ENTITY % MATHML.prefixed \"INCLUDE\" >\n"
    + "  <!ENTITY % MATHML.prefix \"m\">\n"
    + ......

dtd.setInternalSubset(internal);

See: http://www.jdom.org/docs/apidocs/org/jdom2/DocType.html#setInternalSubset(java.lang.String)

This will modify the declaration, and, if you output the XML, you should have the content you expect:

XMLOutputter xout = new XMLOutputter();
xout.output(doc, System.out);
查看更多
登录 后发表回答