javascript XSLT nodes, selecting the first of a gr

2019-07-16 04:05发布

问题:

after reading some of the merge posts out here, my question appears to be simpler and I am not capable to find out the answer. So I post a new question.

The original xml

<data>

<proteins>
<protein>
<accession>111</accession>
</protein>
</proteins>

<peptides>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
</peptides>

</data>

the xslt, used as an .xsl page to be interpreted by a browser

<xsl:template match="/">
<xsl:apply-templates select="/data/proteins/protein" />
</xsl:template>

<xsl:template match="/data/proteins/protein">
<xsl:apply-templates select="/data/peptides/peptide[accession = current()/accession]" >
</xsl:template>

<xsl:template match="/data/peptides/peptide">
...
</xsl:template>

the output that I got (conceptually, since this is a simplification of a larger code)

<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>

and the output that I would like to have, i.e. to have only one entry for each sequence, so to avoid having redudancy

<peptide>
<accession>111</accession>
<sequence>AAA</sequence>
</peptide>
<peptide>
<accession>111</accession>
<sequence>BBB</sequence>
</peptide>

I would be happy to have just the first of the nodes that share the same sequence (so not merging them). Any help is highly welcomed :)

Thanks!

回答1:

What your stylesheet is missing is a way to identify the first in a group of identical items. The following stylesheet uses an xsl:key to group peptide elements by a combination of their accession and sequence values:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:key name="byAccSeq" match="peptide" 
                             use="concat(accession, '|', sequence)"/>
    <xsl:template match="/">
        <root><xsl:apply-templates select="/*/proteins/protein"/></root>
    </xsl:template>
    <xsl:template match="protein">
        <xsl:apply-templates
            select="../../peptides/peptide[accession=current()/accession]"/>
    </xsl:template>
    <xsl:template match="peptide[generate-id()=
             generate-id(key('byAccSeq', concat(accession, '|', sequence))[1])]">
        <xsl:copy-of select="."/>
    </xsl:template>
    <xsl:template match="peptide"/>
</xsl:stylesheet>

Output:

<root>
    <peptide>
        <accession>111</accession>
        <sequence>AAA</sequence>
    </peptide>
    <peptide>
        <accession>111</accession>
        <sequence>BBB</sequence>
    </peptide>
</root>

Explanation: The following line:

<xsl:key name="byAccSeq" match="peptide" 
                         use="concat(., accession, sequence)"/>

...groups peptide elements using keys whose values are equal to concat(., accession, sequence). Elements can be later retrieved by reproducing the key for some peptide element:

key('byAccSeq', concat(/path/to/peptide, accession, sequence))

To match the first element in the list of nodes returned for some key, we use the following template/pattern:

<xsl:template match="peptide[generate-id()=
               generate-id(key('byAccSeq', concat(., accession, sequence))[1])]">

The generate-id function returns a unique identifier for every node in the document. We're asking for any peptide element whose unique ID is equal to the unique ID of a node that's first in the list for some key.

We then ignore all other peptide elements -- the ones that aren't first for some key -- with the following template:

<xsl:template match="peptide"/>

This grouping technique is called the Muenchian Method. Further reading:

  • http://www.jenitennison.com/xslt/grouping/muenchian.html


回答2:

An alternate Muenchian grouping (just one template and a single instruction):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kPepByAccAndSeq" match="peptide"
  use="concat(accession, '+', sequence)"/>

 <xsl:template match="/">
   <xsl:copy-of select=
    "/*/peptides
          /peptide
              [generate-id()
              =
               generate-id(key('kPepByAccAndSeq',
                               concat(accession, '+', sequence)
                              )[1]
                          )
              ]
    "/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied to the provided XML document:

<data>
    <proteins>
        <protein>
            <accession>111</accession>
        </protein>
    </proteins>
    <peptides>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>AAA</sequence>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
        </peptide>
        <peptide>
            <accession>111</accession>
            <sequence>BBB</sequence>
        </peptide>
    </peptides>
</data>

the wanted, correct result is produced:

<peptide>
   <accession>111</accession>
   <sequence>AAA</sequence>
</peptide>
<peptide>
   <accession>111</accession>
   <sequence>BBB</sequence>
</peptide>

Explanation: Muenchian grouping where the key value is a combination of the values of two elements.