XSLT: Reading content that is devided by empty tag

2019-04-15 13:09发布


So I am busy creating a XSLT file to process various XML documents into a new node layout.

There's one thing I can't figure out, here is an example of XML that I'm working with:

   This is a paragraph on the page.
   This is another paragraph.
   Here is yet another paragraph on this page.

As you can see the paragraphs are split up using empty tags as deviders. In the result XML I want this:

    This is a paragraph on the page.
    This is another paragraph.
   Here is yet another paragraph on this page.

How can I achieve this using XSLT (Version 1.0 only)?


This is more or less a duplicate of another question, so the same approach will work:

<xsl:template match="pages">
    <xsl:apply-templates />

<xsl:template match="page/text()">
    <p><xsl:value-of select="."/></p>

<xsl:template match="NewParagraph" />

Simple and clean. Hope it helps


The following answer is not as elegant as @stwissel's but it will correctly any tag sub trees in the paragraphs. It did become a little nasty, indeed. :-)

The problem with this task is that it requires special handling of what is between a closing tag and following matching opening tag (e.g. <tag></tag>). XSLT, however, is optimized for handling what is between and an opening tag and a matching closing tag (e.g. </tag><tag>). By the way: There's a way to "cheat" a little bit. See my other answer to this question.

Suppose you have an input XML as follows:

    This is a paragraph on the page.
    After Bold
    This is another paragraph.
    Here is yet another paragraph on this page.
        Bold and emphasized.
    After bold and emphasized.
    Another page.

It can be processed using the this XSLT 1.0 transformation

<?xml version="1.0" encoding="UTF-8"?>
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />

  <xsl:template match="page">
      <!-- handle the first paragraph up to the first newParagraph -->
        <xsl:apply-templates select="node()[not(preceding-sibling::newParagraph)]" />

      <!-- now handle all remaining paragraphs of the page -->
      <xsl:for-each select="newParagraph">
        <xsl:variable name="pCount" select="position()"/>
          <xsl:apply-templates select="following-sibling::node()[count(preceding-sibling::newParagraph) &lt;= $pCount]" />

  <!-- this default rule recursively copies all substructures within a paragraph at tag level -->  
  <xsl:template match="node()|@*">
      <xsl:apply-templates select="node()|@*"/>

  <!-- this default rule makes sure that texts between the tags are printed -->
  <xsl:template match="text()">
    <xsl:copy-of select="."/>

  <xsl:template match="newParagraph"/>


producing this output

    This is a paragraph on the page.
    After Bold
    This is another paragraph.
    Here is yet another paragraph on this page.
        Bold and emphasized.
    After bold and emphasized.
    Another page.


If you are willing to "cheat" a little bit you can manually insert XML tags into result document which are not part of the node tree but which are normal text. A processor downstream, however, will not notice the difference provided that it re-parses the output.

Given the input of my other answer the following XSLT 1.0 transformation will do the trick (preserving the sub trees in the paragraphs):

<?xml version="1.0" encoding="UTF-8"?>
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />

  <xsl:template match="page">

  <!-- this default rule recursively copies all substructures within a paragraph at tag level -->  
  <xsl:template match="node()|@*">
      <xsl:apply-templates select="node()|@*"/>

  <!-- this default rule makes sure that texts between the tags are printed -->
  <xsl:template match="text()">
    <xsl:copy-of select="."/>

  <xsl:template match="newParagraph">
    <!-- This inserts a matching closing and opening tag -->
    <xsl:value-of select="'&lt;/P&gt;&lt;P&gt;'" disable-output-escaping="yes" />
