How to unformat xml file

2019-07-26 07:24发布

问题:

I have a method which returns a String with a formatted xml. The method reads the xml from a file on the server and parses it into the string:

Esentially what the method currently does is:

  private ServletConfig config;
  InputStream xmlIn = null ;
  xmlIn = config.getServletContext().getResourceAsStream(filename + ".xml") ; 
  String xml = IOUtils.toString(xmlIn);
  IOUtils.closeQuietly(xmlIn);
  return xml;  

What I need to do is add a new input argument, and based on that value, continue returning the formatted xml, or return unformatted xml.

What I mean with formatted xml is something like:

<xml>
  <root>
    <elements>
       <elem1/>
       <elem2/>
    <elements>
  <root>
</xml>

And what I mean with unformatted xml is something like:

<xml><root><elements><elem1/><elem2/><elements><root></xml>

or:

<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>

Is there a simple way to do this?

回答1:

Strip all newline characters with String xml = IOUtils.toString(xmlIn).replace("\n", ""). Or \t to keep several lines but without indentation.



回答2:

Try something like the following:

TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
    new StreamSource(new StringReader(
        "<xsl:stylesheet version=\"1.0\"" +
        "   xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">" + 
        "<xsl:output method=\"xml\" omit-xml-declaration=\"yes\"/>" +
        "  <xsl:strip-space elements=\"*\"/>" + 
        "  <xsl:template match=\"@*|node()\">" +
        "   <xsl:copy>" +
        "    <xsl:apply-templates select=\"@*|node()\"/>" +
        "   </xsl:copy>" +
        "  </xsl:template>" +
        "</xsl:stylesheet>"
    ))
);
Source source = new StreamSource(new StringReader("xml string here"));
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);

Instead of source being StreamSource in the second instance, it can also be DOMSource if you have an in-memory Document, if you want to modify the DOM before saving.

DOMSource source = new DOMSource(document);

To read an XML file into a Document object:

File file = new File("c:\\MyXMLFile.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();

Enjoy :)



回答3:

If you fancy trying your hand with JAXB then the marshaller has a handy property for setting whether to format (use new lines and indent) the output or not.

JAXBContext jc = JAXBContext.newInstance(packageName);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
m.marshal(element, outputStream);

Quite an overhead to get to that stage though... perhaps a good option if you already have a solid xsd



回答4:

if you are sure that the formatted xml like:

<xml>
  <root>
    <elements>
       <elem1/>
       <elem2/>
    <elements>
  <root>
</xml>

you can replace all group 1 in ^(\s*)< to "". in this way, the text in xml won't be changed.



回答5:

an empty transformer with a parameter setting the indent params like so

public static String getStringFromDocument(Document dom, boolean indented) {
    String signedContent = null;        
    try {
            StringWriter sw = new StringWriter();
            DOMSource domSource = new DOMSource(dom);
            TransformerFactory tf = new TransformerFactoryImpl();
            Transformer trans = tf.newTransformer();
            trans = tf.newTransformer();
            trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");

            trans.transform(domSource, new StreamResult(sw));
            sw.flush();
            signedContent = sw.toString();

        } catch (TransformerException e) {
            e.printStackTrace();
        }
        return signedContent;
    }

works for me.

the key lies in this line

 trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");


回答6:

You can: 1) remove all consecutive whitespaces (but not single whitespace) and then replace all >(whitespace)< by >< applicable only if usefull content does not have multiple consecutive significant whitespaces 2) read it in some dom tree and serialize it using some nonpretty serialization

    SAXReader reader = new SAXReader();
    Reader r = new StringReader(data);
    Document document = reader.read(r);
    OutputFormat format = OutputFormat.createCompactFormat();
    StringWriter sw = new StringWriter();
    XMLWriter writer = new XMLWriter(sw, format);
    writer.write(document);
    String string = writer.toString();

3) use Canonicalization (but you must somehow explain to it that those whitespaces you want to remove are insignificant)