I have been trying to write an XPath/XSLT for my problem of detecting and eliminating duplicate nodes. In my case, duplicate nodes are nodes with multiple attributes with same values. The way I want to eliminate duplicate is by excluding the last occurrence of the duplicate node. Please advice if there is any other method.
Pls Note: Duplicate nodes = Nodes with same values of operator1, operator2 and operator3 attributes.
XML:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="8" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/>
</data>
Output I need:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>
I closest I have come is with this Xpath, but it doesn't work exactly.
"//record[(./@operator1 = following-sibling::record/@operator1) and (./@operator2 = following-sibling::record/@operator2) and (./@operator3 = following-sibling::record/@operator3)]".
I have searched the whole internet but without any luck. Any help is really really appreciated. Thanks alot.
This is a conflicting definition of duplicate nodes elimination.
You are not eliminating duplicate nodes by just removing the last of a sequence of duplicates. Your desired result:
still contains duplicates such as
record
s with id (1, 4, and 6) and (2, 3, and 7)Proper duplicates elimination, also called deduplication, requires to leave only one item from all duplicate items. This is traditionally accomplished in XSLT 1.0 by using the Muenchian method for grouping:
when this transformation is applied on the provided XML document:
the wanted, correct result is produced:
Here is an example XSLT 1.0 stylesheet that eliminates the last of duplicate 'record' elements: