Below is the xml that has CDATA section
<?xml version="1.0" encoding="ISO-8859-1"?>
<character>
<name>
<role>Indiana Jones</role>
<actor>Harrison Ford</actor>
<part>protagonist</part>
<![CDATA[ <film>Indiana Jones and the Kingdom of the Crystal Skull</film>]]>
</name>
</character>
For above xml i need to rip off the CDATA and add new element under the existing element "film" , so the final output will be :
<?xml version="1.0" encoding="ISO-8859-1"?>
<character>
<name>
<role>Indiana Jones</role>
<actor>Harrison Ford</actor>
<part>protagonist</part>
<film>Indiana Jones and the Kingdom of the Crystal Skull</film>
<Language>English</Language>
</name>
</character>
Is this can be done using XSLT?
First, the fact that your input XML has "CDATA" is in one sense irrelevant... the XSLT can't tell whether it's CDATA or not. What's key about your input XML is that you have escaped markup
<film>...</film>
, and you want to turn it into a real element.If you know that the escaped element will always have a certain name ('film'), and you know where it occurs, you can strip it and replace it easily:
If you don't know in advance where the escaped tags will occur and what the element names are, you could use XSLT 2.0's
<xsl:analyze-string>
to find and replace them. But as Alejandro pointed out, general parsing of XML using regular expressions can get very messy. It would only be feasible if you know the markup will be simple.Since the
film
element in the CDATA block appears to be well-formed, you could use disable-output-escaping. If you match of the name/text(), select value-of with DOE and then insert theLanguage
element immediately following.A slightly modified identify function should work.
Given this XML:
Using this XSLT:
Will produce this output:
(Tested using Saxon-HE 9.3.0.5 in oXygen 12.2.)
I was dealing with something similar and I found a good solution so I thought of sharing it with you, but this one is for
NSXMLParser
.If you're using
NSXMLParser
there's a delegate method calledfoundCDATA
which can look like this:Now add this prewritten class to your project. Then import it to the parser class you want to use it in:
#import NSString_stripHTML
Now simply you can add the following line to
foundCDATA
method:Now you will have the stripped text without any extra characters. You can substring whatever you want from this stripped text.
Another way to solve this which would give you some more control over the transformation is to use Andrew Welsh LexEv XMLReader. This gives you the possibility to process CDATA sections as markup among other things.