xml, xsl transformation with CDATA

2019-07-19 21:47发布

问题:

I am new to xsl xml transformation. For now, I have an xml file that contains the following information:

<bio>
<published>Tue, 7 Oct 2008 14:47:26 +0000</published>
<summary><![CDATA[
   Dream Theater is an American <a
   href="http://www.last.fm/tag/progressive%20metal" class="bbcode_tag"
   rel="tag">progressive metal</a> band formed in 1985 under the name
   &quot;<a href="http://www.last.fm/music/Majesty"
   class="bbcode_artist">Majesty</a>&quot; by <a
   href="http://www.last.fm/music/John+Myung"
   class="bbcode_artist">John Myung</a>,
   <a href="http://www.last.fm/music/John+Petrucci"
   class="bbcode_artist">John Petrucci</a>
]]>
</summary>
</bio>

And my xsl file contains this:

<h3><xsl:value-of select="lfm/artist/bio/published"/></h3>
<p>
   <xsl:value-of select="lfm/artist/bio/summary" disable-output-escaping="yes"/>
</p>
<html>
   <body>
      <xsl:value-of select="lfm/artist/bio/content"/>
   </body>
</html>

What I'm trying to do now is to extract the tag-structured data out of the <summary><[CDATA[]]></summary> and show it in the browser as in this example:

<a href="http://www.last.fm/tag/progressive%20metal" class="bbcode_tag" rel="tag">progressive metal</a>
<a href="http://www.last.fm/music/Majesty" class="bbcode_artist">Majesty</a>
<a href="http://www.last.fm/music/John+Myung" class="bbcode_artist">John Myung</a>
<a href="http://www.last.fm/music/John+Petrucci" class="bbcode_artist">John Petrucci</a>

For now when I open the xml page, it shows all the CDATA content, even with those html tags... I want to get those tags to do their job properly in html form.

sorry for the terrible description..hope you guys can get what i mean here...

回答1:

The CDATA is just (a part of) a text node and what seems like markup inside it is one-dimensional text (badly destroyed markup) and this cannot be accomplished (in XSLT 1.0 and XSLT 2.0) without calling an extension function.

<p><xsl:copy-of select="my:parse(lfm/artist/bio/summary)"></p>

In XSLT 3.0 there may be a new standard function parse-xml() that does exactly this.

Update:

Here is a complete code example, assuming you are using XslCompiledTransform in .NET:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:msxsl="urn:schemas-microsoft-com:xslt"
 xmlns:my="my:my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="summary/text()">
  <xsl:copy-of select="my:parse(.)/*/*"/>
 </xsl:template>

 <msxsl:script language="c#" implements-prefix="my">
  public XmlDocument parse(string text)
  {
    XmlDocument doc = new XmlDocument();
    doc.LoadXml("&lt;t>"+text+"&lt;/t>");

    return doc;
  }
 </msxsl:script>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<bio>
 <published>Tue, 7 Oct 2008 14:47:26 +0000</published>
 <summary><![CDATA[Dream Theater is an American <a href="http://www.last.fm/tag/progressive%20metal" class="bbcode_tag" rel="tag">progressive metal</a> band formed in 1985 under the name &quot;<a href="http://www.last.fm/music/Majesty" class="bbcode_artist">Majesty</a>&quot; by <a href="http://www.last.fm/music/John+Myung" class="bbcode_artist">John Myung</a>, <a href="http://www.last.fm/music/John+Petrucci" class="bbcode_artist">John Petrucci</a>]]>
 </summary>
</bio>

the wanted, correct result (the CDATA is replaced by the reconstituted markup) is produced:

<bio>
  <published>Tue, 7 Oct 2008 14:47:26 +0000</published>
  <summary>
    <a href="http://www.last.fm/tag/progressive%20metal" class="bbcode_tag" rel="tag">progressive metal</a>
    <a href="http://www.last.fm/music/Majesty" class="bbcode_artist">Majesty</a>
    <a href="http://www.last.fm/music/John+Myung" class="bbcode_artist">John Myung</a>
    <a href="http://www.last.fm/music/John+Petrucci" class="bbcode_artist">John Petrucci</a>
  </summary>
</bio>