Exclude last occurance of duplicate nodes. Duplica

I have been trying to write an XPath/XSLT for my problem of detecting and eliminating duplicate nodes. In my case, duplicate nodes are nodes with multiple attributes with same values. The way I want to eliminate duplicate is by excluding the last occurrence of the duplicate node. Please advice if there is any other method.

Pls Note: Duplicate nodes = Nodes with same values of operator1, operator2 and operator3 attributes.

XML:

<data id = "root">
  <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/>
  <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="8" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/>
</data>

Output I need:

<data id = "root">
  <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
  <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
  <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>

I closest I have come is with this Xpath, but it doesn't work exactly.

"//record[(./@operator1 = following-sibling::record/@operator1) and (./@operator2 = following-sibling::record/@operator2) and (./@operator3 = following-sibling::record/@operator3)]".

I have searched the whole internet but without any luck. Any help is really really appreciated. Thanks alot.

标签： xml xslt xpath

2条回答

Melony?

2楼-- · 2019-07-25 10:19

I have been trying to write an XPath/XSLT for my problem of detecting and eliminating duplicate nodes. In my case, duplicate nodes are nodes with multiple attributes with same values. The way I want to eliminate duplicate is by excluding the last occurrence of the duplicate node. Please advice if there is any other method.

Pls Note: Duplicate nodes = Nodes with same values of operator1, operator2 and operator3 attributes.

This is a conflicting definition of duplicate nodes elimination.

You are not eliminating duplicate nodes by just removing the last of a sequence of duplicates. Your desired result:

<data id = "root">
    <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>

still contains duplicates such as records with id (1, 4, and 6) and (2, 3, and 7)

Proper duplicates elimination, also called deduplication, requires to leave only one item from all duplicate items. This is traditionally accomplished in XSLT 1.0 by using the Muenchian method for grouping:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kRecByAtts" match="record"
  use="concat(@operator1,'***',
              @operator2,'***',
              @operator3)"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match=
  "record
      [not(generate-id()
      =
       generate-id(key('kRecByAtts',
                       concat(@operator1,'***',
                              @operator2,'***',
                              @operator3)
                       )[1]
                   )
            )
       ]
  "/>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<data id = "root">
    <record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
    <record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
    <record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
</data>

the wanted, correct result is produced:

<data id="root">
   <record id="1" operator1="xxx" operator2="yyy" operator3="zzz"/>
   <record id="2" operator1="abc" operator2="yyy" operator3="zzz"/>
</data>

0人赞添加讨论(0) 举报

We Are One

3楼-- · 2019-07-25 10:30

Here is an example XSLT 1.0 stylesheet that eliminates the last of duplicate 'record' elements:

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

  <xsl:key name="k1" match="record" 
           use="concat(@operator1, '|', @operator2, '|', @operator3)"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="record[generate-id() 
                              = 
                              generate-id(key('k1', 
                                              concat(@operator1, '|', 
                                                     @operator2, '|', 
                                                     @operator3))[last()])]"/>

</xsl:stylesheet>

0人赞添加讨论(0) 举报

Exclude last occurance of duplicate nodes. Duplica

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间