I have been trying to write an XPath/XSLT for my problem of detecting and eliminating duplicate nodes. In my case, duplicate nodes are nodes with multiple attributes with same values. The way I want to eliminate duplicate is by excluding the last occurrence of the duplicate node. Please advice if there is any other method.
Pls Note: Duplicate nodes = Nodes with same values of operator1, operator2 and operator3 attributes.
XML:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="8" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/>
</data>
Output I need:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>
I closest I have come is with this Xpath, but it doesn't work exactly.
"//record[(./@operator1 = following-sibling::record/@operator1) and (./@operator2 = following-sibling::record/@operator2) and (./@operator3 = following-sibling::record/@operator3)]".
I have searched the whole internet but without any luck. Any help is really really appreciated. Thanks alot.
I have been trying to write an
XPath/XSLT for my problem of detecting
and eliminating duplicate nodes. In my
case, duplicate nodes are nodes with
multiple attributes with same values.
The way I want to eliminate duplicate
is by excluding the last occurrence of
the duplicate node. Please advice if
there is any other method.
Pls Note: Duplicate nodes = Nodes with
same values of operator1, operator2
and operator3 attributes.
This is a conflicting definition of duplicate nodes elimination.
You are not eliminating duplicate nodes by just removing the last of a sequence of duplicates. Your desired result:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>
still contains duplicates such as record
s with id (1, 4, and 6) and (2, 3, and 7)
Proper duplicates elimination, also called deduplication, requires to leave only one item from all duplicate items. This is traditionally accomplished in XSLT 1.0 by using the Muenchian method for grouping:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kRecByAtts" match="record"
use="concat(@operator1,'***',
@operator2,'***',
@operator3)"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"record
[not(generate-id()
=
generate-id(key('kRecByAtts',
concat(@operator1,'***',
@operator2,'***',
@operator3)
)[1]
)
)
]
"/>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>
the wanted, correct result is produced:
<data id="root">
<record id="1" operator1="xxx" operator2="yyy" operator3="zzz"/>
<record id="2" operator1="abc" operator2="yyy" operator3="zzz"/>
</data>
Here is an example XSLT 1.0 stylesheet that eliminates the last of duplicate 'record' elements:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:key name="k1" match="record"
use="concat(@operator1, '|', @operator2, '|', @operator3)"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[generate-id()
=
generate-id(key('k1',
concat(@operator1, '|',
@operator2, '|',
@operator3))[last()])]"/>
</xsl:stylesheet>