I have this XML file:
<?xml version="1.0" encoding="UTF-8"?>
<d:dictionary xmlns="http://www.w3.org/1999/xhtml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="a" d:title="a">
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<d:index d:value="a" d:title="a"/>
<d:index d:value="c" d:title="c"/>
<d:index d:value="b" d:title="b"/>
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<div>This is the content for entry.</div>
</d:entry>
<d:entry id="b" d:title="b">
<d:index d:value="a" d:title="a"/>
<d:index d:value="b" d:title="b"/>
<div>This is the content for entry.</div>
</d:entry>
</d:dictionary>
I'm trying to remove the duplicate <d:index
of the entries using XSLT
following this posting: https://stackoverflow.com/a/56898207/589924
Note: Every entry have its own independent
<d:index
, i.e. same index in different entries should not count as a duplicate. And the resulting xml should honor the original xml format.
The xsl
file is like this:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<xsl:template>
<xsl:copy>
<xsl:for-each-group select="d:index"
group-by="concat(@d:value, '~', @d:title)">
<xsl:copy-of select="current-group()[1]"/>
</xsl:for-each-group>
<xsl:copy-of select="div"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
But the result is not expected, it removes all tags except for the content of div
.
<?xml version="1.0"?>
This is the content for entry.
This is the content for entry.
Sometimes using directly programming libraries may be easier. Following a Perl script using XML::DT
as usual,
sudo cpan XML::DT
if not installed.Using the Muenchian method for grouping:
When this transformation is applied against the provided XML document:
the wanted, correct result is produced: