Distinct values with XSLT 1.0 when XPath has multi

2019-05-04 09:39发布

Yet another question about getting distinct values using XSLT 1.0. Here's a stupid, made-up example that should illustrate my problem.

<?xml version="1.0" encoding="UTF-8"?>
<moviesByYear>
    <year1994>
        <movie>
            <genre>Action</genre>
            <director>A</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Comedy</genre>
            <director>A</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Drama</genre>
            <director>B</director>
        </movie>
    </year1994>
    <year1994>
        <movie>
            <genre>Thriller</genre>
            <director>C</director>
        </movie>
    </year1994>
    <year1995>
        <movie>
            <genre>Action</genre>
            <director>A</director>
        </movie>
    </year1995>
    <year1995>
        <movie>
            <genre>Comedy</genre>
            <director>C</director>
        </movie>
    </year1995>
    <year1996>
        <movie>
            <genre>Thriller</genre>
            <director>A</director>
        </movie>
    </year1996>
</moviesByYear>

Now let's say that I'd like to list all years that produced movies that are either comedies or directed by director B. I use the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="text" encoding="UTF-8" indent="no"/>
    <xsl:template match="/">
        <xsl:for-each select="/moviesByYear/*[movie/genre='Comedy' or movie/director='B']">
            <xsl:value-of select="name()"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

This gives me the following output:

year1994year1994year1995

I have not yet found any solution for getting distinct values that would work here. For example, using name(.) != name(following-sibling::*) causes year1994 to be excluded altogether.

In my real-world case I have a complex XML structure and an XPath with many criteria that picks out a number of nodes, from which I need to get an output of distinct node names.

Update: michael.hor257k gave an elegant solution to this, but using it I faced a problem with xsl:key. Allow me to alter the scenario a bit:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <genres>
        <genre>Action</genre>
        <genre>Comedy</genre>
        <genre>Drama</genre>
        <genre>Thriller</genre>
    </genres>
    <moviesByYear>
        <year1994>
            <movie>
                <genre>Action</genre>
                <director>A</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Comedy</genre>
                <director>A</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Drama</genre>
                <director>B</director>
            </movie>
        </year1994>
        <year1994>
            <movie>
                <genre>Thriller</genre>
                <director>C</director>
            </movie>
        </year1994>
        <year1995>
            <movie>
                <genre>Action</genre>
                <director>A</director>
            </movie>
        </year1995>
        <year1995>
            <movie>
                <genre>Comedy</genre>
                <director>C</director>
            </movie>
        </year1995>
        <year1996>
            <movie>
                <genre>Thriller</genre>
                <director>A</director>
            </movie>
        </year1996>
    </moviesByYear>
</root>

Now let's say that I want a list of genres, each of which lists years that produced movies of that genre or movies directed by director B. Stylesheet:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="urn:schemas-microsoft-com:xslt"
extension-element-prefixes="exsl">
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="no"/>

<xsl:template match="/">
    <xsl:for-each select="/root/genres/genre">
        <xsl:call-template name="output">
            <xsl:with-param name="genre">
                <xsl:value-of select="."/>
            </xsl:with-param>
        </xsl:call-template>
    </xsl:for-each>
</xsl:template>

<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="year" use="." />

<xsl:template name="output">
    <xsl:param name="genre"/>

    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="/root/moviesByYear/*/movie[genre=$genre or director=$director]"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />

    <!-- final pass -->
    <xsl:value-of select="concat($genre, ': ')"/> 
    <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
        <xsl:value-of select="."/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text>

</xsl:template>

</xsl:stylesheet>

This produces the following output:

Action: year1994year1995
Comedy: 
Drama: 
Thriller: year1996

As you can see, each year is listed only once. The desired output would have been:

Action: year1994year1995
Comedy: year1994year1995
Drama: year1994
Thriller: year1994year1996

5条回答
乱世女痞
2楼-- · 2019-05-04 10:09

This might work for you:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
    <xsl:for-each select="moviesByYear/*[movie/genre='Comedy' or movie/director='B'][not(name(.) = following-sibling::*[movie/genre='Comedy' or movie/director='B']/name(.))]">
        <xsl:value-of select="name()"/>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>
查看更多
Luminary・发光体
3楼-- · 2019-05-04 10:18

Here's a different implementation of Muenchian grouping - one that allows you to parametrize the criteria by which the movies are selected.

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="genre" select="'Comedy'"/>
<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="year" use="." />

<xsl:template match="/">

    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="moviesByYear/*/movie[genre=$genre or director=$director]"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />

    <!-- final pass -->
    <output>
        <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </output>

</xsl:template>

</xsl:stylesheet>

When the above is applied to your example input, the result is:

<?xml version="1.0" encoding="UTF-8"?>
<output>
   <year>year1994</year>
   <year>year1995</year>
</output>

Edit:

With regard to your modified input, I believe I would do it this way:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="director" select="'B'"/>

<xsl:key name="movies-by-genre" match="movie" use="genre" />
<xsl:key name="movies-by-director" match="movie" use="director" />
<xsl:key name="year" match="year" use="." />

<xsl:template match="/">
    <output>
        <xsl:apply-templates select="root/genres/genre"/>
    </output>
</xsl:template>

<xsl:template match="genre">
    <!-- first pass -->
    <xsl:variable name="years">
        <xsl:for-each select="key('movies-by-genre', .) | key('movies-by-director', $director)"> 
            <year><xsl:value-of select="local-name(..)"/></year>
        </xsl:for-each>
    </xsl:variable>
    <xsl:variable name="years-set" select="exsl:node-set($years)" />
    <!-- final pass -->
    <genre name="{.}">
        <xsl:for-each select="$years-set/year[count(. | key('year', .)[1]) = 1]">
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </genre>
</xsl:template>

</xsl:stylesheet>

The result here is:

<?xml version="1.0" encoding="UTF-8"?>
<output>
   <genre name="Action">
      <year>year1994</year>
      <year>year1995</year>
   </genre>
   <genre name="Comedy">
      <year>year1994</year>
      <year>year1995</year>
   </genre>
   <genre name="Drama">
      <year>year1994</year>
   </genre>
   <genre name="Thriller">
      <year>year1994</year>
      <year>year1996</year>
   </genre>
</output>

Note: the two added keys are for efficiency only - they are not required for the main purpose here.


Edit 2:

On second thought, we could do this all in a single pass, thus (hopefully) avoiding the issues Xalan and MSXSML have with processing a variable - but still using Muenchian grouping:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="director" select="'B'"/>

<xsl:key name="year" match="moviesByYear/*" use="local-name()" />

<xsl:template match="/">
    <output>
        <xsl:apply-templates select="root/genres/genre"/>
    </output>
</xsl:template>

<xsl:template match="genre">
    <xsl:variable name="genre" select="." />
    <genre name="{$genre}">
        <xsl:for-each select="../../moviesByYear/* 
        [count(. | key('year', local-name())[1]) = 1]
        [key('year', local-name())/movie[genre=$genre or director=$director]]">
            <year>
                <xsl:value-of select="local-name()"/>
            </year>  
        </xsl:for-each>
    </genre>
</xsl:template>

</xsl:stylesheet>
查看更多
爱情/是我丢掉的垃圾
4楼-- · 2019-05-04 10:26

I don't think there is a way to do that w/ a straight xpath expression. The only way I can see to do this is using an XSLT variable (or in my example a template parameter).

What this code is doing is in the for-each loop in the master template the XPATH /moviesByYear/*[name(.) != name(following-sibling::*)] selects the last instance of each child year, so that the node-set only contains 1 element for each year. We don't care whether that element is going to match our actual criteria or not, we just care about its name.

The we stuff that name in the parameter to the named foo template, which uses that name to select all matching elements but only of that year and then select the 1st of those /moviesByYear/*[name(.) = $year][movie/genre='Comedy' or movie/director='B'][1]. If we find that 1 matching element we output the name.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="text" encoding="UTF-8" indent="no"/>

<xsl:template match="/">
  <xsl:for-each select="/moviesByYear/*[name(.) != name(following-sibling::*)]">
    <xsl:call-template name="foo">
      <xsl:with-param name="year" select="name(.)"/>
    </xsl:call-template>
  </xsl:for-each>
</xsl:template>

<xsl:template name="foo">
 <xsl:param name="year" select="'year1994'" />
 <xsl:for-each select="/moviesByYear/*[name(.) = $year][movie/genre='Comedy' or movie/director='B'][1]">
            <xsl:value-of select="name(.)"/>
        </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

This template when run against your test date produces the output:

year1994year1995
查看更多
ら.Afraid
5楼-- · 2019-05-04 10:29

Now, just because it was said it couldn't be done, here's an XPath one-liner:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:param name="genre" select="'Comedy'"/>
<xsl:param name="director" select="'B'"/>

<xsl:template match="/">
    <output>
        <xsl:for-each select="//movie[genre=$genre or director=$director] 
        [not(local-name(..)=local-name(preceding::movie[genre=$genre or director=$director]/parent::*))]">
            <year>       
                <xsl:value-of select="local-name(..)"/>
            </year>  
        </xsl:for-each>
    </output>
</xsl:template>

</xsl:stylesheet>

Not recommended for efficiency.

查看更多
仙女界的扛把子
6楼-- · 2019-05-04 10:30

Use http://www.jenitennison.com/xslt/grouping/muenchian.xml:

<xsl:key name="k1" match="moviesByYear/*[movie/genre='Comedy' or movie/director='B']" use="local-name()"/>

    <xsl:template match="/">
        <xsl:for-each select="/moviesByYear/*[movie/genre='Comedy' or movie/director='B'][generate-id() = generate-id(key('k1', local-name())[1])]">
            <xsl:if test="position() > 1"><xsl:text> </xsl:text></xsl:if>
            <xsl:value-of select="name()"/>
        </xsl:for-each>
    </xsl:template>
查看更多
登录 后发表回答