Transform XML with XSLT in Java using DOM

2019-05-31 16:36发布

问题:

I've been looking around everywhere for Java samples to transform an XML document with an XSLT. I've found several samples using new File("path/to/file.xml") to load in the XML and the XSLT and those work great. My problem is that I'm trying to use this in a new method that will accept two org.w3c.dom.Document objects. As soon as I replace the StreamSource used to load in the XSLT with a DOMSource the result of my call is then the XSLT instead of the transformed XML.

Working code from How to call XSL template from java code?:

Source xmlInput = new StreamSource(new File("c:/path/to/input.xml"));
Source xsl = new StreamSource(new File("c:/path/to/file.xsl"));
Result xmlOutput = new StreamResult(new File("c:/path/to/output.xml"));

try {
    Transformer transformer =     TransformerFactory.newInstance().newTransformer(xsl);
    transformer.transform(xmlInput, xmlOutput);
} catch (TransformerException e) {
    // Handle.
}

My code:

public static Document transformXML(Document xml, Document xslt) throws TransformerException, UnsupportedEncodingException, SAXException, IOException, ParserConfigurationException, FactoryConfigurationError{

    Source xmlSource = new DOMSource(xml);
    Source xsltSource = new DOMSource(xslt);
    StreamResult result = new StreamResult(new StringWriter());

    // the factory pattern supports different XSLT processors
    TransformerFactory transFact =
            TransformerFactory.newInstance();
    Transformer trans = transFact.newTransformer(xsltSource);

    trans.transform(xmlSource, result);

    Document resultDoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new ByteArrayInputStream(result.getWriter().toString().getBytes("utf-8")));

    return resultDoc;
}

My result document is then the XSLT instead of the XML. What am I doing wrong with the DOMSource?

回答1:

XSLT and XPath only make sense with a namespace aware DOM implementation and DOM tree, that is why I asked "Are the DOM trees you feed to the transformer built with a namespace aware document builder?" in my comment.

As far as I have tested with Oracle Java 1.8, when a not namespace-aware DocumentBuilderFactory and the built-in Transformer is used, your method returns the stylesheet code. However as soon as I change the DocumentBuilderFactory to be namespace aware, the result is as intended.

Here is the working sample:

package domsourcetest1;

import java.io.IOException;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import org.w3c.dom.Document;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.SAXException;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;

/**
 *
 * @author Martin Honnen
 */
public class DOMSourceTest1 {


    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, TransformerException {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document xslt = db.parse("sheet1.xsl");
        Document xml = db.newDocument();
        xml.appendChild(xml.createElementNS(null, "root"));
        Document result = transformXML(xml, xslt);
        System.out.println(result.getDocumentElement().getTextContent());
        LSSerializer serializer = ((DOMImplementationLS) xml.getImplementation()).createLSSerializer();
        System.out.println(serializer.writeToString(result));
    }

    public static Document transformXML(Document xml, Document xslt) throws TransformerException, ParserConfigurationException, FactoryConfigurationError {

        Source xmlSource = new DOMSource(xml);
        Source xsltSource = new DOMSource(xslt);
        DOMResult result = new DOMResult();

        // the factory pattern supports different XSLT processors
        TransformerFactory transFact
                = TransformerFactory.newInstance();
        Transformer trans = transFact.newTransformer(xsltSource);

        trans.transform(xmlSource, result);

        Document resultDoc = (Document) result.getNode();

        return resultDoc;
    }
}

The sample stylesheet simply outputs information about the XSLT processor:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="/">
        <debug>
            <xsl:value-of select="system-property('xsl:vendor')"/>
        </debug>
    </xsl:template>

</xsl:stylesheet>

Output of the program is

Apache Software Foundation (Xalan XSLTC)
<?xml version="1.0" encoding="UTF-16"?>
<debug>Apache Software Foundation (Xalan XSLTC)</debug>

Now when I comment out //dbf.setNamespaceAware(true); in the main method the result is

<?xml version="1.0" encoding="UTF-16"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><debug><xsl:value-of select="system-property('xsl:vendor')"/></debug></xsl:template></xsl:stylesheet>

meaning the result is indeed the stylesheet document. That is obviously a bug or at least a quirk with the built-in Xalan Transformer, when I put Saxon 6.5.5 on the class path the problem does not occur, nor does it occur with Saxon 9.6 on the class path.

In general, however, I don't think you will get meaningful results when using XSLT or XPath with not namespace aware DOM trees. See also the DOM2DOM sample in the Xalan release http://svn.apache.org/viewvc/xalan/java/tags/xalan-j_2_7_2/samples/DOM2DOM/DOM2DOM.java?revision=1695338&view=markup which says

  // And setNamespaceAware, which is required when parsing xsl files
  dFactory.setNamespaceAware(true);


标签: java xml xslt dom