after reading some of the merge posts out here, my question appears to be simpler and I am not capable to find out the answer. So I post a new question.
The original xml
<data>
<proteins>
<protein>
<accession>111</accession>
</protein>
</proteins>
<peptides>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
</peptides>
</data>
the xslt, used as an .xsl page to be interpreted by a browser
<xsl:template match="/">
<xsl:apply-templates select="/data/proteins/protein" />
</xsl:template>
<xsl:template match="/data/proteins/protein">
<xsl:apply-templates select="/data/peptides/peptide[accession = current()/accession]" >
</xsl:template>
<xsl:template match="/data/peptides/peptide">
...
</xsl:template>
the output that I got (conceptually, since this is a simplification of a larger code)
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
and the output that I would like to have, i.e. to have only one entry for each sequence, so to avoid having redudancy
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
I would be happy to have just the first of the nodes that share the same sequence (so not merging them). Any help is highly welcomed :)
Thanks!
An alternate Muenchian grouping (just one template and a single instruction):
when this transformation is applied to the provided XML document:
the wanted, correct result is produced:
Explanation: Muenchian grouping where the key value is a combination of the values of two elements.
What your stylesheet is missing is a way to identify the first in a group of identical items. The following stylesheet uses an
xsl:key
to grouppeptide
elements by a combination of theiraccession
andsequence
values:Output:
Explanation: The following line:
...groups
peptide
elements using keys whose values are equal toconcat(., accession, sequence)
. Elements can be later retrieved by reproducing the key for somepeptide
element:To match the first element in the list of nodes returned for some key, we use the following template/pattern:
The
generate-id
function returns a unique identifier for every node in the document. We're asking for anypeptide
element whose unique ID is equal to the unique ID of a node that's first in the list for some key.We then ignore all other
peptide
elements -- the ones that aren't first for some key -- with the following template:This grouping technique is called the Muenchian Method. Further reading: