How to preserve newlines in CDATA when generating

I want to write some text that contains whitespace characters such as newline and tab into an xml file so I use

Element element = xmldoc.createElement("TestElement");
element.appendChild(xmldoc.createCDATASection(somestring));

but when I read this back in using

Node vs =  xmldoc.getElementsByTagName("TestElement").item(0);
String x = vs.getFirstChild().getNodeValue();

I get a string that has no newlines anymore.
When i look directly into the xml on disk, the newlines seem preserved. so the problem occurs when reading in the xml file.

How can I preserve the newlines?

Thanks!

标签： java xml newline w3c cdata

5条回答

乱世女痞

2楼-- · 2019-04-08 07:11

You don't necessarily have to use CDATA to preserve white space characters. The XML specification specify how to encode these characters.

So for example, if you have an element with value that contains new space you should encode it with

  &#xA;

Carriage return:

 &#xD;

And so forth

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2019-04-08 07:16

You need to check the type of each node using node.getNodeType(). If the type is CDATA_SECTION_NODE, you need to concat the CDATA guards to node.getNodeValue.

0人赞添加讨论(0) 举报

在下西门庆

4楼-- · 2019-04-08 07:22

EDIT: cut all the irrelevant stuff

I'm curious to know what DOM implementation you're using, because it doesn't mirror the default behaviour of the one in a couple of JVMs I've tried (they ship with a Xerces impl). I'm also interested in what newline characters your document has.

I'm not sure if whether CDATA should preserve whitespace is a given. I suspect that there are many factors involved. Don't DTDs/schemas affect how whitespace is processed?

You could try using the xml:space="preserve" attribute.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

5楼-- · 2019-04-08 07:24

xml:space='preserve' is not it. That is only for "all whitespace" nodes. That is, if you want the whitespace nodes in

<this xml:space='preserve'> <has/>
<whitespace/>
</this>

But see that those whitespace nodes are ONLY whitespace.

I have been struggling to get Xerces to generate events allowing isolation of CDATA content as well. I have no solution as yet.

0人赞添加讨论(0) 举报

可以哭但决不认输i

6楼-- · 2019-04-08 07:26

I don't know how you parse and write your document, but here's an enhanced code example based on yours:

// creating the document in-memory                                                        
Document xmldoc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

Element element = xmldoc.createElement("TestElement");                                    
xmldoc.appendChild(element);                                                              
element.appendChild(xmldoc.createCDATASection("first line\nsecond line\n"));              

// serializing the xml to a string                                                        
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();             

DOMImplementationLS impl =                                                                
    (DOMImplementationLS)registry.getDOMImplementation("LS");                             

LSSerializer writer = impl.createLSSerializer();                                          
String str = writer.writeToString(xmldoc);                                                

// printing the xml for verification of whitespace in cdata                               
System.out.println("--- XML ---");                                                        
System.out.println(str);                                                                  

// de-serializing the xml from the string                                                 
final Charset charset = Charset.forName("utf-16");                                        
final ByteArrayInputStream input = new ByteArrayInputStream(str.getBytes(charset));       
Document xmldoc2 = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(input);

Node vs =  xmldoc2.getElementsByTagName("TestElement").item(0);                           
final Node child = vs.getFirstChild();                                                    
String x = child.getNodeValue();                                                          

// print the value, yay!                                                                  
System.out.println("--- Node Text ---");                                                  
System.out.println(x);

The serialization using LSSerializer is the W3C way to do it (see here). The output is as expected, with line separators:

--- XML --- 
<?xml version="1.0" encoding="UTF-16"?>
<TestElement><![CDATA[first line
second line ]]></TestElement>
--- Node Text --- 
first line
second line

0人赞添加讨论(0) 举报

How to preserve newlines in CDATA when generating

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间