I am trying to convert a document with content like the following into another document, leaving the CDATA exactly as it was in the first document, but I haven't figured out how to preserve the CDATA with XSLT.
Initial XML:
<node>
<subNode>
<![CDATA[ HI THERE ]]>
</subNode>
<subNode>
<![CDATA[ SOME TEXT ]]>
</subNode>
</node>
Final XML:
<newDoc>
<data>
<text>
<![CDATA[ HI THERE ]]>
</text>
<text>
<![CDATA[ SOME TEXT ]]>
</text>
</data>
</newDoc>
I've tried something like this, but no luck, everything gets jumbled:
<xsl:element name="subNode">
<xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:element>
Any ideas how to preserve the CDATA?
Thanks! Lance
Using ruby/nokogiri
Update: Here's something that works.
<text disable-output-escaping="yes"><![CDATA[</text>
<value-of select="normalize-space(text())" disable-output-escaping="yes"/>
<text disable-output-escaping="yes">]]></text>
That will wrap all text() nodes in CDATA, which works for what I need, and it will preserve html tags inside the text.
Sorry to post an answer to my own question, but I found something that works:
That will wrap all text() nodes in CDATA, which works for what I need, and it will preserve html tags inside the text.
You cannot preserve the precise sequence of CDATA nodes if they're mixed with plain text nodes. At best, you can force all content of a particular element in the output to be CDATA, by listing that element name in
xsl:output/@cdata-section-elements
:I found this article while trying to solve a similar problem (using an XSL transform to take one XML file and create a partial/subset copy of some of the nodes in it, as a second XML file). In my case the first XML files have some elements whose values are entirely wrapped in CDATA blocks, because they happen to be JSON and they carry some HTML formatting markup.
What I found was that rather than using
xsl:value-of
, I could usexsl:copy-of
, and just as @Pavel Minaev points out, I could keep the original CDATA intact by listing every relevant element name in the xsl:output declaration. This might be an approach that would work for the OP.XML to be copied (sample):
Relevant stylesheet lines:
The
cdata-section-elements
attribute means that in the output, the original CDATA blocks in the XML copied from will be passed through, as-is, to the output XML file when the transform runs. It appears that you can name as many elements as you want.In the OP's example, I believe he would select on
//node/subNode
and then build an element namedtext
, insidenewDoc/data
of course. Hiscdata-section-elements attribute
would be simply="text"
, exactly as Pavel has it.