Grouping XML nodes between each two processing ins

2019-06-11 06:14发布

问题:

Using XSLT 1.0, I need to group each node between processing instructions <?start?> and <?end?>. The nodes outside of this pair should be unchanged at the output.

First, I need to find a way how to select only the nodes that are between each start - end pair. Suppose we have an example input XML:

<root>
  abc
  <?start?>
    def<Highlighted bold="yes">
    <Highlighted italic="yes">ghi</Highlighted>
    </Highlighted>jkl
    <?pi?>
    <table>
      <Caption>stu</Caption>
    </table>vw
  <?end?>
  xy
  <?start?> 
  abc <Caption>def</Caption> ghi
  <?end?>
  jkl
</root>

Furthermore, I need to have nodes OUTSIDE the "start - end" intersections at the output as well. This means that at the output: a) nodes at the intersection of start - end PI will be in the group element b) any node outside the intersection will be printed unchanged. Note that the input document may also have no start - end processing instruction pair.

For example, from the given input, the output should be as follows:

<root>
  abc
  <group>
    def<Highlighted bold="yes">
    <Highlighted italic="yes">ghi</Highlighted>
    </Highlighted>jkl
    <?pi?>
    <table>
      <Caption>stu</Caption>
    </table>vw
   </group>
   xy
   <group> 
     abc <Caption>def</Caption> ghi
   </group>
   jkl
 <root>

A part of this question has already been answered in Finding all XML nodes between each two processing instructions. But I struggle with printing out nodes outside of the group element without duplicating any node.

回答1:

Code:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="xml" indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()[1]"/>
        </xsl:copy>
        <xsl:apply-templates select="following-sibling::node()[1]"/>
    </xsl:template>

    <xsl:template match="processing-instruction('start')">
        <group>
            <xsl:apply-templates select="following-sibling::node()[1]"/>
        </group>
        <xsl:apply-templates select="following-sibling::processing-instruction('end')[1]/following-sibling::node()[1]"/>
    </xsl:template>

    <xsl:template match="processing-instruction('end')"/>

</xsl:stylesheet>

This XSLT uses a recursive approach of processing first child node and the first following-sibling node().

The first template processes all nodes except start and end pis. The second template adds the group element to the elements between start pi and end pi. Third template helps remove the end pi.



回答2:

Change the template for root to

<xsl:template match="root">
  <xsl:copy>
    <xsl:apply-templates select="key('start', '') | processing-instruction('start') | node()[preceding-sibling::processing-instruction()[1][self::processing-instruction('end')]] | key('end', '')"/>          
  </xsl:copy>
</xsl:template>

so the whole code becomes

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:output indent="yes"/>

    <xsl:key name="start" match="root/node()[not(self::processing-instruction('start'))]" use="generate-id(preceding-sibling::processing-instruction('start')[1])"/>
    <xsl:key name="end" match="root/node()[not(self::processing-instruction('end'))]" use="generate-id(following-sibling::processing-instruction('end')[1])"/>

    <xsl:template match="root">
      <xsl:copy>
        <xsl:apply-templates select="key('start', '') | processing-instruction('start') | node()[preceding-sibling::processing-instruction()[1][self::processing-instruction('end')]] | key('end', '')"/>          
      </xsl:copy>
    </xsl:template>

    <xsl:template match="processing-instruction('start')">
        <xsl:variable name="end" select="following-sibling::processing-instruction('end')[1]"/>
        <xsl:variable name="following-start" select="key('start', generate-id())"/>
        <xsl:variable name="preceding-end" select="key('end', generate-id($end))"/>
        <xsl:variable name="intersect" select="$following-start[count(. | $preceding-end) = count($preceding-end)]"/>
        <group>
            <xsl:copy-of select="$intersect"/>
        </group>
    </xsl:template>

</xsl:stylesheet>

Not very well tested but I hope it covers all nodes before, in between and after the pis.



回答3:

Even if I like the recursive solution from @Lingamurthy CS
Her a slightly different key based solution.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
    <xsl:output indent="yes"/>

    <xsl:key name="pi" match="root/node()"
             use="generate-id(preceding-sibling::processing-instruction()
                  [self::processing-instruction('start') or self::processing-instruction('end') ][1])"/>

    <xsl:template match="@* | node()">
      <xsl:copy>
          <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
    </xsl:template>

    <xsl:template match="root">
      <xsl:copy>
        <xsl:apply-templates select="processing-instruction('start') | 
                                     processing-instruction('end') |
                                     key('pi', '')"/>          
      </xsl:copy>
    </xsl:template>

    <xsl:template match="processing-instruction('start')">
        <group>
          <xsl:apply-templates 
             select="key('pi', generate-id())
                 [not(self::processing-instruction('end'))]" />
        </group>
    </xsl:template>

    <xsl:template match="processing-instruction('end')">
       <xsl:apply-templates 
           select="key('pi', generate-id())[not(self::processing-instruction('start'))]" />
   </xsl:template>

</xsl:stylesheet>


标签: xslt xpath