Need XSLT transform to remove duplicate elements -

2019-06-18 15:08发布

I have a terrible piece of XML that I need to process through BizTalk, and I have managed to normalise it into this example below. I am no XSLT ninja, but between the web and the VS2010 debugger, I can find my way around XSL.

I now need a clever bit of XSLT to "weed out" the duplicate elements and only keep the latest ones, as decided by the date in the ValidFromDate attribute.

The ValidFromDate attribute is of the XSD:Date type.

<SomeData>
  <A ValidFromDate="2011-12-01">A_1</A>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B CalidFromDate="2011-12-03">B_1</B>
  <B ValidFromDate="2012-01-17">B_2</B>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
  <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>

After a transformation I'd like to only keep these lines:

<SomeData>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
</SomeData>

Any clues as to how I put that XSL together? I've emptied the internet trying to look for a solution, and I have tried a lot of clever XSL sorting scripts, but none I felt took me in the right direction.

6条回答
迷人小祖宗
2楼-- · 2019-06-18 15:30

The following stylesheet produces the correct result without any reliance on the input order:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:key name="byName" match="/SomeData/*" use="name()"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="SomeData">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:for-each select="*[generate-id()=
                                    generate-id(key('byName', name())[1])]">
                <xsl:apply-templates select="key('byName', name())" mode="out">
                    <xsl:sort select="translate(@ValidFromDate, '-', '')" 
                              data-type="number" order="descending"/>
                </xsl:apply-templates>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="SomeData/*" mode="out">
        <xsl:if test="position()=1">
            <xsl:apply-templates select="."/>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

Output:

<SomeData>
   <A ValidFromDate="2012-01-19">A_2</A>
   <B ValidFromDate="2012-01-19">B_3</B>
   <C ValidFromDate="2012-01-20">C_1</C>
</SomeData>

Note that the result is slightly different than what you listed as the desired output, because C_1 is actually the latest C element (i.e. the input is not already sorted). By relying on an initial sort order (and blindly following the listed expected output) the existing answers are actually incorrect.

Explanation:

  • An xsl:key groups all /SomeData/* by name()
  • The outer for-each selects the first item in each group
  • Templates are then applied to all members of that group, which are sorted by @ValidFromDate
  • A single additional template handles picking the first element out of each sorted group
  • An Identity Transform template takes care of the rest
查看更多
爷、活的狠高调
3楼-- · 2019-06-18 15:36

XLST 2.0 solution without relying on input order.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <SomeData>
            <xsl:for-each-group select="/SomeData/*" group-by="name()">
                    <xsl:for-each select="current-group()">
                        <xsl:sort select="number(substring(attribute(),1,4))" order="descending" data-type="number"/> <!-- year-->
                        <xsl:sort select="number(substring(attribute(),6,2))" order="descending" data-type="number"/> <!-- month-->
                        <xsl:sort select="number(substring(attribute(),9,2))" order="descending" data-type="number"/> <!-- date-->
                        <xsl:if test="position()=1">
                                <xsl:sequence select="."/>
                        </xsl:if>
                    </xsl:for-each>
            </xsl:for-each-group>
        </SomeData>
</xsl:template>
</xsl:stylesheet>
查看更多
Root(大扎)
4楼-- · 2019-06-18 15:38

Based on @ValidFromDate order:

XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:key name="k" match="*" use="name()"/>

  <xsl:template match="SomeData">
    <xsl:copy>
      <xsl:apply-templates select="*[generate-id() = 
                           generate-id(key('k', name()))]"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:apply-templates select="key('k', name())" mode="a">
      <xsl:sort select="@ValidFromDate" order="descending"/>
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="*" mode="a">
    <xsl:if test="position() = 1">
      <xsl:copy-of select="."/>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

applied on:

<SomeData>
  <A ValidFromDate="2011-12-01">A_1</A>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B CalidFromDate="2011-12-03">B_1</B>
  <B ValidFromDate="2012-01-17">B_2</B>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
  <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>

produces:

<SomeData>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
</SomeData>
查看更多
聊天终结者
5楼-- · 2019-06-18 15:47

The optimal solution for this problem with Xslt 1.0 would be to use Muenchian grouping. (Given that the elements are already sorted by the ValidFromDate attribute) the following stylesheet should do the trick:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:key name="element-key" match="/SomeData/*" use="name()" />

  <xsl:template match="/SomeData">
    <xsl:copy>
      <xsl:for-each select="*[generate-id() = generate-id(key('element-key', name()))]">
        <xsl:copy-of select="(. | following-sibling::*[name() = name(current())])[last()]" />
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Here is the result I got when running it against your sample Xml:

<?xml version="1.0" encoding="utf-8"?>
<SomeData>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>
查看更多
我欲成王,谁敢阻挡
6楼-- · 2019-06-18 15:52

Based on Pawel's answer, I made the following modification, which produces the same result:

<xsl:template match="/SomeData">
  <xsl:copy>
    <xsl:copy-of select="*[generate-id() = generate-id(key('element-key', name())[last()])]"/>
  </xsl:copy>
</xsl:template>

If they produce the same result every time, I like this because it's a little cleaner.

查看更多
beautiful°
7楼-- · 2019-06-18 15:53

A slightly simpler and shorter XSLT 1.0 solution than that of @lwburk:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kName" match="*/*" use="name()"/>

 <xsl:template match="/">
  <xsl:apply-templates select=
   "*/*[generate-id()
       =
        generate-id(key('kName', name())[1])
       ]
   "/>
 </xsl:template>

 <xsl:template match="*/*">
  <xsl:for-each select="key('kName', name())">
   <xsl:sort select="@ValidFromDate" order="descending"/>
   <xsl:if test="position() = 1">
    <xsl:copy-of select="."/>
   </xsl:if>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<SomeData>
    <A ValidFromDate="2011-12-01">A_1</A>
    <A ValidFromDate="2012-01-19">A_2</A>
    <B CalidFromDate="2011-12-03">B_1</B>
    <B ValidFromDate="2012-01-17">B_2</B>
    <B ValidFromDate="2012-01-19">B_3</B>
    <C ValidFromDate="2012-01-20">C_1</C>
    <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>

the wanted, correct result is produced:

<A ValidFromDate="2012-01-19">A_2</A>
<B ValidFromDate="2012-01-19">B_3</B>
<C ValidFromDate="2012-01-20">C_1</C>
查看更多
登录 后发表回答