I have some XML where I would like to remove identical consecutive child nodes, which are in different parents. That is, if a child (in different parents) node my XML tree appears two times or more consecutively, I want to remove all the duplicates.
The duplicate nodes I'm thinking of are the <child>a</child>
in the first two <parent>
nodes.
An example:
Here is the source XML:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>a</child>
<child>bb</child>
<child>cc</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
<parent>
<child>a</child>
<child>bbbb</child>
<child>cccc</child>
</parent>
</root>
Here is the desired XML:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>bb</child>
<child>cc</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
<parent>
<child>a</child>
<child>bbbb</child>
<child>cccc</child>
</parent>
</root>
Only one element is removed but if there were, for example, 5 consecutive <child>a</child>
nodes at the beginning (instead of 2), four of them would be removed. I'm using XSLT 2.0.
I appreciate any help.
Follow-Up:
Thanks to Kirill I get the documents I want, however this has spawned a new problem that I didn't anticipate, if I have an XML document like this:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
</root>
And I apply Kirill's XSLT, I get this:
<root>
<parent>
<child>a</child>
<child>b</child>
<child>c</child>
</parent>
<parent>
</parent>
<parent>
<child>aaa</child>
<child>bbb</child>
<child>ccc</child>
</parent>
</root>
How can I also remove the <parent> </parent>
? For my application there may be other subelements of <parent>
, which are OK to remove if there is no <child>
element in the <parent>
element.
A solution I have, that I don't like, is to apply another transform after the first one. This only works when applied in order though and I need a separate XSLT file and need to run two commands instead of one.
Here it is:
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="parent[not(child)]"/>
Use:
If you're able to use XSLT 2.0, the problem is solved as follows:
This answers the newly added followup question:
This transformation is an add-on to Kirill's and accomplishes the desired cleanup of the would-be resulting empty
parent
elementwithout the need of a second pass:when applied to the provided XML document:
the wanted, correct result is produced: