Writing Out a DOM as an XML File

2019-06-14 11:26发布

问题:

Straight from the manual:

Writing Out a DOM as an XML File

After you have constructed a DOM (either by parsing an XML file or building it programmatically) you frequently want to save it as XML. This section shows you how to do that using the Xalan transform package.

Using that package, you will create a transformer object to wire a DOMSource to a StreamResult. You will then invoke the transformer's transform() method to write out the DOM as XML data.

my output:

thufir@dur:~/NetBeansProjects/helloWorldSaxon$ 
thufir@dur:~/NetBeansProjects/helloWorldSaxon$ gradle clean run

> Task :run
Jan 04, 2019 3:28:24 PM helloWorldSaxon.HandlerForXML createDocumentFromURL
INFO: http://books.toscrape.com/
Jan 04, 2019 3:28:26 PM helloWorldSaxon.HandlerForXML createDocumentFromURL
INFO: javax.xml.transform.dom.DOMResult@3cda1055
Jan 04, 2019 3:28:26 PM helloWorldSaxon.HandlerForXML createDocumentFromURL
INFO: html

BUILD SUCCESSFUL in 2s
4 actionable tasks: 4 executed
thufir@dur:~/NetBeansProjects/helloWorldSaxon$ 

Firstly, I'd like more meaningful output for what the domResult is, looks like, or contains. More important, I believe, is iterating or traversing document below:

    public void createDocumentFromURL() throws SAXException, IOException, TransformerException, ParserConfigurationException {
        LOG.info(url.toString());

        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
        Source source = new SAXSource(xmlReader, new InputSource(url.toString()));

        DOMResult domResult = new DOMResult();

        Transformer transformer = transformerFactory.newTransformer();
        transformer.transform(source, domResult);  //how do I find the result of this operation?

        LOG.info(domResult.toString());  //traverse or iterate how?

        DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
//        Document document = documentBuilder.parse();   ///bzzzt, wrong

        Document document = (Document) domResult.getNode();

        LOG.info(document.getDocumentElement().getTagName());
        }

That the output is "html" inclines me to believe that this is the html. The desired output is that html, but from a Document, rather than a String.

Oracle documention on writing out a DOM is to parse the document. Is this document not already parsed? Or, to put another way, how do I establish that it is or is not an XML file at all?

So.....transform it again?

see also:

Java: convert StreamResult to DOM

回答1:

You really just have to transform the DOM to your file.

Example

// Create DOM
Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
Element root = document.createElement("Root");
document.appendChild(root);
Element foo = document.createElement("Foo");
foo.appendChild(document.createTextNode("Bar"));
root.appendChild(foo);

You can save that DOM to a file like this:

// Write DOM to file as XML
File xmlFile = new File("/path/to/file.xml");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(document), new StreamResult(xmlFile));

You can also just print the DOM like this:

// Print DOM as XML
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new DOMSource(document), new StreamResult(System.out));

Output

<?xml version="1.0" encoding="UTF-8" standalone="no"?><Root><Foo>Bar</Foo></Root>

If you want the XML formatted:

// Print DOM as formatted XML
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(new DOMSource(document), new StreamResult(System.out));

Output

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Root>
    <Foo>Bar</Foo>
</Root>