How can I eliminate duplicate nodes based on values of multiple (more than 1) attributes? Also the attribute names are passed as parameters to the stylesheet. Now I am aware of the Muenchian method of grouping that uses a <xsl:key>
element. But I came to know that XSLT 1.0 does not allow paramters/variables in <xsl:key>
.
Is there another method(s) to achieve duplicate nodes removal? It is fine if it not as efficient as the Munechian method.
Update from previus question:
XML:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="8" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/>
</data>
Use this transformation (simple and no need to generate a new stylesheet):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ext="http://exslt.org/common">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="pAttribs">
<name>operator1</name>
<name>operator2</name>
<name>operator3</name>
</xsl:param>
<xsl:variable name="vAttribs" select=
"document('')/*/xsl:param[@name='pAttribs']"/>
<xsl:key name="kRecByAtts" match="record"
use="@___g_key"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:variable name="vrtdPass1">
<xsl:apply-templates/>
</xsl:variable>
<xsl:variable name="vPass1" select=
"ext:node-set($vrtdPass1)/*"/>
<xsl:apply-templates select="$vPass1"/>
</xsl:template>
<xsl:template match="record[not(@___g_key)]">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:attribute name="___g_key">
<xsl:for-each select="@*[name()=$vAttribs/name]">
<xsl:sort select="name()"/>
<xsl:value-of select=
"concat('___Attrib___',name(),'___Value___',.,'+++')"/>
</xsl:for-each>
</xsl:attribute>
</xsl:copy>
</xsl:template>
<xsl:template match=
"record[@___g_key]
[not(generate-id()
=
generate-id(key('kRecByAtts', @___g_key)[1])
)
]
"/>
<xsl:template match="@___g_key"/>
</xsl:stylesheet>
When applied to the XML document of your previous question:
<data id = "root">
<record id="1" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="2" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="3" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="4" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="5" operator1='xxx' operator2='lkj' operator3='tyu'/>
<record id="6" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="7" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="8" operator1='abc' operator2='yyy' operator3='zzz'/>
<record id="9" operator1='xxx' operator2='yyy' operator3='zzz'/>
<record id="10" operator1='rrr' operator2='yyy' operator3='zzz'/>
</data>
The wanted, correct result is produced:
<data id="root">
<record id="1" operator1="xxx" operator2="yyy" operator3="zzz"/>
<record id="2" operator1="abc" operator2="yyy" operator3="zzz"/>
<record id="5" operator1="xxx" operator2="lkj" operator3="tyu"/>
<record id="10" operator1="rrr" operator2="yyy" operator3="zzz"/>
</data>
Other approach for a single transformation in two steps:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="msxsl">
<xsl:key name="kItemByLocal" match="record[@local-key]" use="@local-key"/>
<xsl:param name="pAttNames" select="'operator1 operator2 operator3'"/>
<xsl:template match="/">
<xsl:variable name="vFirstRTF">
<xsl:apply-templates/>
</xsl:variable>
<xsl:apply-templates select="msxsl:node-set($vFirstRTF)/node()"/>
</xsl:template>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[not(@local-key)]">
<xsl:copy>
<xsl:attribute name="local-key">
<xsl:call-template name="local-key"/>
</xsl:attribute>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[@local-key]
[count(.|key('kItemByLocal',@local-key)[1])
!= 1]|@local-key"/>
<xsl:template name="local-key">
<xsl:param name="pAttributes" select="concat($pAttNames,' ')"/>
<xsl:if test="normalize-space($pAttributes)">
<xsl:variable name="vName"
select="substring-before($pAttributes,' ')"/>
<xsl:variable name="vAttribute" select="@*[name()=$vName]"/>
<xsl:value-of select="concat($vName,'+',$vAttribute,'+')"/>
<xsl:call-template name="local-key">
<xsl:with-param name="pAttributes"
select="substring-after($pAttributes,' ')"/>
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Output:
<data id="root">
<record id="1" operator1="xxx" operator2="yyy" operator3="zzz"></record>
<record id="2" operator1="abc" operator2="yyy" operator3="zzz"></record>
<record id="5" operator1="xxx" operator2="lkj" operator3="tyu"></record>
<record id="10" operator1="rrr" operator2="yyy" operator3="zzz"></record>
</data>
Edit: Also without named template for @local-key
generation
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="msxsl">
<xsl:key name="kItemByLocal" match="record[@local-key]" use="@local-key"/>
<xsl:param name="pAttNames" select="'operator1 operator2 operator3'"/>
<xsl:template match="/">
<xsl:variable name="vFirstRTF">
<xsl:apply-templates/>
</xsl:variable>
<xsl:apply-templates select="msxsl:node-set($vFirstRTF)/node()"/>
</xsl:template>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[not(@local-key)]">
<xsl:variable name="vAttNames"
select="concat(' ',$pAttNames,' ')"/>
<xsl:copy>
<xsl:attribute name="local-key">
<xsl:for-each select="@*[contains(
$vAttNames,
concat(' ',name(),' ')
)]">
<xsl:sort select="substring-before(
$vAttNames,
concat(' ',name(),' ')
)"/>
<xsl:value-of select="concat(name(),'++',.,'++')"/>
</xsl:for-each>
</xsl:attribute>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="record[@local-key]
[count(.|key('kItemByLocal',@local-key)[1])
!= 1]|@local-key"/>
</xsl:stylesheet>
Note: If you are positive sure that attributes order is the same for all elements, then you could remove the sorting.
If you want to pass in the attribute names as a parameter then one approach could be a two step transformation where the first step takes any XML input and simply the attribute names and the element names as parameters to generate a second stylesheet that then eliminates the duplicates.
Here is an example first stylesheet:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias"
exclude-result-prefixes="axsl exsl"
version="1.0">
<xsl:param name="parent-name" select="'items'"/>
<xsl:param name="element-name" select="'item'"/>
<xsl:param name="att-names" select="'att1,att2'"/>
<xsl:param name="sep" select="'|'"/>
<xsl:namespace-alias stylesheet-prefix="axsl" result-prefix="xsl"/>
<xsl:output method="xml" indent="yes"/>
<xsl:variable name="key-value">
<xsl:text>concat(</xsl:text>
<xsl:call-template name="define-values">
<xsl:with-param name="att-names" select="$att-names"/>
</xsl:call-template>
<xsl:text>)</xsl:text>
</xsl:variable>
<xsl:template name="define-values">
<xsl:param name="att-names"/>
<xsl:choose>
<xsl:when test="contains($att-names, ',')">
<xsl:value-of select="concat('@', substring-before($att-names, ','), ',"', $sep, '",')"/>
<xsl:call-template name="define-values">
<xsl:with-param name="att-names" select="substring-after($att-names, ',')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="concat('@', $att-names)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="/">
<axsl:stylesheet version="1.0">
<axsl:output indent="yes"/>
<axsl:key name="k1" match="{$parent-name}/{$element-name}" use="{$key-value}"/>
<axsl:template match="@* | node()">
<axsl:copy>
<axsl:apply-templates select="@* | node()"/>
</axsl:copy>
</axsl:template>
<axsl:template match="{$parent-name}">
<axsl:copy>
<axsl:apply-templates select="@*"/>
<axsl:apply-templates select="{$element-name}[generate-id() = generate-id(key('k1', {$key-value})[1])]"/>
</axsl:copy>
</axsl:template>
</axsl:stylesheet>
</xsl:template>
</xsl:stylesheet>
It takes four parameters:
parent-name
: the name of the element containing those elements of which you want to eliminate duplicates
element-name
: the name of those elements of which you want to eliminate duplicates
att-names
: a comma separated list of attribute names
sep
: a separator character that should not occur in attribute values in the input XML
The stylesheet then generates a second stylesheet that applies Muenchian grouping to eliminate duplicates. For instance with the default parameters given in the stylesheet Saxon 6.5.5 generates the following stylesheet:
<axsl:stylesheet xmlns:axsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<axsl:output indent="yes"/>
<axsl:key name="k1" match="items/item" use="concat(@att1,"|",@att2)"/>
<axsl:template match="@* | node()">
<axsl:copy>
<axsl:apply-templates select="@* | node()"/>
</axsl:copy>
</axsl:template>
<axsl:template match="items">
<axsl:copy>
<axsl:apply-templates select="@*"/>
<axsl:apply-templates select="item[generate-id() = generate-id(key('k1', concat(@att1,"|",@att2))[1])]"/>
</axsl:copy>
</axsl:template>
</axsl:stylesheet>
This can the be applied to an XML document like
<items>
<item att1="a" att2="1" att3="A"/>
<item att1="b" att2="1" att3="A"/>
<item att1="a" att2="1" att3="B"/>
<item att1="c" att2="2" att3="A"/>
<item att1="d" att2="3" att3="C"/>
</items>
and the output is
<items>
<item att1="a" att2="1" att3="A"/>
<item att1="b" att2="1" att3="A"/>
<item att1="c" att2="2" att3="A"/>
<item att1="d" att2="3" att3="C"/>
</items>