XSLT 2.0 - Keep default element, remove duplicate

2019-06-03 08:52发布

问题:

Using XSLT 2.0.

I need to filter all elements that have attribute @xml:lang where the attribute's values are not in a list of possible values I define. Ex allowable values: x-default,en,en-US,en-GB

When @xml:lang is detected on any element, and if x-default exists, then any sibling element of same type with @xml:lang value other than x-default should be compared to the element's text value of x-default, and if same element text value, be removed. To say that another way, any sibling duplicates of @xml:lang="x-default" should be removed, based on the element's text value comparison.

Bonus points if it's possible to rank the order of duplicates, such that x-default is always chosen (if exists), followed by a second tier (en, fr, ru), followed by a third tier (en-EN, en-GB, fr-FR, ru-RU), where the second tier duplicates of the first tier are removed, and the third tier is compared to second tier (if exists), or else the first tier, so that the third tier is also removed if duplicate. This would need to be handled dynamically, as there are many possible languages.

A special case that should be also considered, is a situation where first tier (x-default) has some value, second tier (en) has some valuation, third tier (en-US) has some value. In this situation, there's no duplicate to remove, as the second tier exists and the third tier does not match it.

My current XSLT (doesn't attempt removing duplicates, as I've not found a sure-fire solution yet, and any attempts on my part have failed miserably). This is not my ideal XSLT, it's just the best I know to build currently, and it's able to filter down the data set. The programmer in me would like to see all the or's changed to an array-value check so the values can be managed more cleanly, but I'm not sure if that's doable in XSLT:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://some.namespace/uri">

<xsl:strip-space elements="*"/>
<xsl:output omit-xml-declaration="yes" indent="yes"/>

<!-- Select everything except comment/processing instruction nodes -->
<xsl:template match="attribute()|element()|text()">
    <xsl:copy>
        <xsl:apply-templates select="attribute()|element()|text()"/>
    </xsl:copy>
</xsl:template>

<!-- Remove categories & assignments not in our whitelist -->
<xsl:template match="//*[@category-id and not(@category-id='root'
    or @category-id='men' or @category-id='men_clothing' or @category-id='men_clothing_tshirts'
    or @category-id='sales' or @category-id='sales_men' or @category-id='sales_men_tees'
    or @category-id='sales' or @category-id='sales_women' or @category-id='sales_women_tanks-teeshirts'
    or @category-id='clothing' or @category-id='clothing_teeshirts'
    or @category-id='kids' or @category-id='kids_0816'
    or @category-id='men' or @category-id='men_shoes' or @category-id='men_shoes_skate'
    or @category-id='sales' or @category-id='sales_men' or @category-id='sales _men_shoes'
    )]"/>

<!-- Remove locales not default or in whitelist -->
<xsl:template match="//*[@xml:lang and not(@xml:lang='x-default' or @xml:lang='en' or @xml:lang='en-US' or @xml:lang='en-CA' or @xml:lang='en-GB' or @xml:lang='fr' or @xml:lang='fr-FR' or @xml:lang='ru' or @xml:lang='ru-RU')]"/>

<!-- Remove empty nodes -->
<xsl:template match="*[not(normalize-space()) and not(.//@*)]"/>

</xsl:stylesheet>

Example dataset below. In a real dataset, there are many more elements, so again the logic to remove duplicate @xml:lang entries must not be hard-coded to the XPath you might deduce, but rather operate on groups of same-type data, grouped adjacent.

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://some.namespace/uri" catalog-id="catalog-products">
  <category category-id="sales_women">
    <display-name xml:lang="x-default"><![CDATA[Women's Sales]]></display-name>
    <display-name xml:lang="en"><![CDATA[Sales for Women]]></display-name>
    <display-name xml:lang="en-US"><![CDATA[Women's Sales]]></display-name>
  </category>
  <product product-id="111111111">
    <display-name xml:lang="x-default"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="de"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="en"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="en-US"><![CDATA[Aurora Fleece]]></display-name>
    <display-name xml:lang="es"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="fr"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="ru"><![CDATA[Aurora]]></display-name>
    <short-description xml:lang="de"><![CDATA[Aurora - Fleece-Top für Damen]]></short-description>
    <short-description xml:lang="en"><![CDATA[Aurora - Sweatshirt for women]]></short-description>
    <short-description xml:lang="x-default"><![CDATA[Aurora - Sweatshirt for women]]></short-description>
    <short-description xml:lang="en-US"><![CDATA[Snow Fleece & Softshells - Aurora Fleece]]></short-description>
    <short-description xml:lang="es"><![CDATA[Aurora - Top polar de mujer]]></short-description>
    <short-description xml:lang="fr"><![CDATA[Aurora - haut en polaire femme]]></short-description>
    <short-description xml:lang="ru"><![CDATA[Свитшот SomeBrand для девушек]]></short-description>
    <long-description xml:lang="de"><![CDATA[<p class="productLongDescriptionTitle"></p><p class="productLongDescriptionSubTitle">Composition</p><p>100 % Polyester</p>]]></long-description>
    <long-description xml:lang="en"><![CDATA[<p class="productLongDescriptionTitle"></p><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester</p>]]></long-description>
    <long-description xml:lang="x-default"><![CDATA[<p class="productLongDescriptionTitle"><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester</p>]]></long-description>
    <long-description xml:lang="en-US"><![CDATA[<p class="productLongDescriptionTitle"></p><p>The stretchy, polar fleece Aurora zip-up shields you from the elements with street-savvy style to have you standing out on the slopes and the sidewalk. Designed with a tailored fit, tech details include zippered hand warmer pockets, a lyrca binding finish, a chest pocket, flatlock seams for smooth comfort, and ergonomic seams for support. Imported. 100% polyester polar fleece.</p><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester
Polar Fleece</p>]]></long-description>
    <long-description xml:lang="es"><![CDATA[<p class="productLongDescriptionSubTitle">Composition</p><p>100% poliéster</p>]]></long-description>
    <long-description xml:lang="fr"><![CDATA[<p class="productLongDescriptionSubTitle">Composition</p><p>100 % polyester</p>]]></long-description>
    <long-description xml:lang="ru"><![CDATA[<p>Женский свитшот SomeBrand из зимней коллекции одежды 2014. Характеристики: влаговыводящая технология DRY-FLIGHT, эластичный флис из полиэстера (250 г), теплые карманы на молнии для ладошек.</p><p class="productLongDescriptionSubTitle"></p><p>100% полиэстер</p>]]></long-description>
    <!-- === PICTURES === -->
    <images>
      <image-group view-type="hi-res">
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_frt1.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_frt2.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_bck1.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
      </image-group>
    </images>
  </product>
</catalog>

Ex desired dataset:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://some.namespace/uri" catalog-id="catalog-products">
  <category category-id="sales_women">
    <display-name xml:lang="x-default"><![CDATA[Women's Sales]]></display-name>
    <display-name xml:lang="en"><![CDATA[Sales for Women]]></display-name>
    <display-name xml:lang="en-US"><![CDATA[Women's Sales]]></display-name>
  </category>
  <product product-id="111111111">
    <display-name xml:lang="x-default"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="en-US"><![CDATA[Aurora Fleece]]></display-name>
    <short-description xml:lang="de"><![CDATA[Aurora - Fleece-Top für Damen]]></short-description>
    <short-description xml:lang="x-default"><![CDATA[Aurora - Sweatshirt for women]]></short-description>
    <short-description xml:lang="en-US"><![CDATA[Snow Fleece & Softshells - Aurora Fleece]]></short-description>
    <short-description xml:lang="es"><![CDATA[Aurora - Top polar de mujer]]></short-description>
    <short-description xml:lang="fr"><![CDATA[Aurora - haut en polaire femme]]></short-description>
    <short-description xml:lang="ru"><![CDATA[Свитшот SomeBrand для девушек]]></short-description>
    <long-description xml:lang="x-default"><![CDATA[<p class="productLongDescriptionTitle"><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester</p>]]></long-description>
    <long-description xml:lang="en-US"><![CDATA[<p class="productLongDescriptionTitle"></p><p>The stretchy, polar fleece Aurora zip-up shields you from the elements with street-savvy style to have you standing out on the slopes and the sidewalk. Designed with a tailored fit, tech details include zippered hand warmer pockets, a lyrca binding finish, a chest pocket, flatlock seams for smooth comfort, and ergonomic seams for support. Imported. 100% polyester polar fleece.</p><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester
Polar Fleece</p>]]></long-description>
    <long-description xml:lang="es"><![CDATA[<p class="productLongDescriptionSubTitle">Composition</p><p>100% poliéster</p>]]></long-description>
    <long-description xml:lang="ru"><![CDATA[<p>Женский свитшот SomeBrand из зимней коллекции одежды 2014. Характеристики: влаговыводящая технология DRY-FLIGHT, эластичный флис из полиэстера (250 г), теплые карманы на молнии для ладошек.</p><p class="productLongDescriptionSubTitle"></p><p>100% полиэстер</p>]]></long-description>
    <!-- === PICTURES === -->
    <images>
      <image-group view-type="hi-res">
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_frt1.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_frt2.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_bck1.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
      </image-group>
    </images>
  </product>
</catalog>

回答1:

Here's something you could use as your starting point. That is, I believe it satisfies your primary request:

When @xml:lang is detected on any element, and if x-default exists, then any sibling element of same type with @xml:lang value other than x-default should be compared to the element's text value of x-default, and if same element text value, be removed.

To state it more clearly, it removes any element that satisfies all of these three conditions:

  1. it has an xml:lang attribute;
  2. the xml:lang attribute is NOT "x-default";
  3. the value of the element is equal to the value of a similarly-named, sibling element whose xml:lang attribute IS "x-default".

XSLT

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns1="http://some.namespace/uri">

<xsl:strip-space elements="*"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" cdata-section-elements="ns1:display-name ns1:short-description ns1:long-description ns1:alt ns1:title"/>

<xsl:key name="default-sibling" match="*[@xml:lang='x-default']" use="concat(generate-id(..), '|', local-name())" />

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="*[@xml:lang!='x-default' and .=key('default-sibling', concat(generate-id(..), '|', local-name()))]"/>

</xsl:stylesheet>

Applied to your example input, the following result is obtained:

<?xml version="1.0" encoding="UTF-8"?>
<catalog xmlns="http://some.namespace/uri" catalog-id="catalog-products">
  <category category-id="sales_women">
    <display-name xml:lang="x-default"><![CDATA[Women's Sales]]></display-name>
    <display-name xml:lang="en"><![CDATA[Sales for Women]]></display-name>
  </category>
  <product product-id="111111111">
    <display-name xml:lang="x-default"><![CDATA[Aurora]]></display-name>
    <display-name xml:lang="en-US"><![CDATA[Aurora Fleece]]></display-name>
    <short-description xml:lang="de"><![CDATA[Aurora - Fleece-Top für Damen]]></short-description>
    <short-description xml:lang="x-default"><![CDATA[Aurora - Sweatshirt for women]]></short-description>
    <short-description xml:lang="en-US"><![CDATA[Snow Fleece & Softshells - Aurora Fleece]]></short-description>
    <short-description xml:lang="es"><![CDATA[Aurora - Top polar de mujer]]></short-description>
    <short-description xml:lang="fr"><![CDATA[Aurora - haut en polaire femme]]></short-description>
    <short-description xml:lang="ru"><![CDATA[Свитшот SomeBrand для девушек]]></short-description>
    <long-description xml:lang="de"><![CDATA[<p class="productLongDescriptionTitle"></p><p class="productLongDescriptionSubTitle">Composition</p><p>100 % Polyester</p>]]></long-description>
    <long-description xml:lang="en"><![CDATA[<p class="productLongDescriptionTitle"></p><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester</p>]]></long-description>
    <long-description xml:lang="x-default"><![CDATA[<p class="productLongDescriptionTitle"><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester</p>]]></long-description>
    <long-description xml:lang="en-US"><![CDATA[<p class="productLongDescriptionTitle"></p><p>The stretchy, polar fleece Aurora zip-up shields you from the elements with street-savvy style to have you standing out on the slopes and the sidewalk. Designed with a tailored fit, tech details include zippered hand warmer pockets, a lyrca binding finish, a chest pocket, flatlock seams for smooth comfort, and ergonomic seams for support. Imported. 100% polyester polar fleece.</p><p class="productLongDescriptionSubTitle">Composition</p><p>100% Polyester
Polar Fleece</p>]]></long-description>
    <long-description xml:lang="es"><![CDATA[<p class="productLongDescriptionSubTitle">Composition</p><p>100% poliéster</p>]]></long-description>
    <long-description xml:lang="fr"><![CDATA[<p class="productLongDescriptionSubTitle">Composition</p><p>100 % polyester</p>]]></long-description>
    <long-description xml:lang="ru"><![CDATA[<p>Женский свитшот SomeBrand из зимней коллекции одежды 2014. Характеристики: влаговыводящая технология DRY-FLIGHT, эластичный флис из полиэстера (250 г), теплые карманы на молнии для ладошек.</p><p class="productLongDescriptionSubTitle"></p><p>100% полиэстер</p>]]></long-description>
<!-- === PICTURES === -->
    <images>
      <image-group view-type="hi-res">
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_frt1.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_frt2.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
        <image path="catalog-products/all/default/hi-res/111111111_aurora,v_kpv0_bck1.jpg">
          <alt xml:lang="x-default"><![CDATA[Aurora 111111111]]></alt>
          <title xml:lang="x-default"><![CDATA[Aurora 111111111]]></title>
        </image>
      </image-group>
    </images>
  </product>
</catalog>

I believe you could extend the same principle to achieve your "bonus points" and the "special case" too (provided you can state the requirements as clearly as the one above - including mutual priorities). You may have to do more than one pass to get them all, though.

I suggest you ask a separate question (or search previous answers) regarding filtering based on a whitelist.