Removing consecutive duplicates with XSLT

2019-07-18 13:36发布

问题:

I have some XML where I would like to remove identical consecutive child nodes, which are in different parents. That is, if a child (in different parents) node my XML tree appears two times or more consecutively, I want to remove all the duplicates.

The duplicate nodes I'm thinking of are the <child>a</child> in the first two <parent> nodes.

An example:

Here is the source XML:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>a</child>
      <child>bb</child>
      <child>cc</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

   <parent>
      <child>a</child>
      <child>bbbb</child>
      <child>cccc</child>
   </parent>

</root>

Here is the desired XML:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>bb</child>
      <child>cc</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

   <parent>
      <child>a</child>
      <child>bbbb</child>
      <child>cccc</child>
   </parent>

</root>

Only one element is removed but if there were, for example, 5 consecutive <child>a</child> nodes at the beginning (instead of 2), four of them would be removed. I'm using XSLT 2.0.

I appreciate any help.

Follow-Up:

Thanks to Kirill I get the documents I want, however this has spawned a new problem that I didn't anticipate, if I have an XML document like this:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

</root>

And I apply Kirill's XSLT, I get this:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

</root>

How can I also remove the <parent> </parent>? For my application there may be other subelements of <parent>, which are OK to remove if there is no <child> element in the <parent> element.

A solution I have, that I don't like, is to apply another transform after the first one. This only works when applied in order though and I need a separate XSLT file and need to run two commands instead of one.

Here it is:

 <xsl:template match="@* | node()">
    <xsl:copy>
        <xsl:apply-templates select="node() | @*"/>
    </xsl:copy>
 </xsl:template>

 <xsl:template match="parent[not(child)]"/>

回答1:

Use:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

  <xsl:template match="child[../preceding-sibling::parent[1]/child = .]"/>

</xsl:stylesheet>


回答2:

If you're able to use XSLT 2.0, the problem is solved as follows:

<xsl:for-each-group select="parent" group-adjacent="child[1]">
  <xsl:for-each select="current-group()">
    <parent>
      <xsl:if test="position()=1">
        <xsl:copy-of select="current-group()[1]/child[1]"/>
      </xsl:if>
      <xsl:copy-of select="current-group()/child[position() gt 1]"/>
    </parent>
  </xsl:for-each>
</xsl:for-each-group>


回答3:

This answers the newly added followup question:

How can I also remove the <parent> </parent>? For my application there may be other subelements of <parent>, which are OK to remove if there is no <child> element in the element.

This transformation is an add-on to Kirill's and accomplishes the desired cleanup of the would-be resulting empty parent elementwithout the need of a second pass:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="child[../preceding-sibling::parent[1]/child = .]"/>

  <xsl:template match=
  "parent
     [not(child
          [not(. = ../preceding-sibling::parent[1]
                                              /child
               )
           ]
          )
     ]"/>
</xsl:stylesheet>

when applied to the provided XML document:

<root>
   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>a</child>
      <child>b</child>
      <child>c</child>
   </parent>

   <parent>
      <child>aaa</child>
      <child>bbb</child>
      <child>ccc</child>
   </parent>

</root>

the wanted, correct result is produced:

<root>
  <parent>
    <child>a</child>
    <child>b</child>
    <child>c</child>
  </parent>
  <parent>
    <child>aaa</child>
    <child>bbb</child>
    <child>ccc</child>
  </parent>
</root>


标签: xml xslt xpath