remove the same node with same attribute under the

2019-07-26 01:24发布

问题:

I need to transform this XML input:

<root>
    <node id="a">
        <section id="a_1" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>

            </item>

            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>

            </item>
        </section>

        <section id="a_2" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>

            </item>
        </section>

    </node>

    <node id="b">
        <section id="b_1" method="create">
            <user id="b_1a" method="x">
                <attribute>

                    <origin>us</origin>
                </attribute>

            </user>
            <user id="b_1a" method="x">
                <attribute> 
                    <origin>us</origin>
                </attribute>
            </user>
            <user id="b_1b">
                <attribute>a</attribute>
            </user>
        </section>

        <section id="b_2">
            <user id="b_1a" method="x">
                <attribute>
                    <name>John</name>
                    <origin>us</origin>
                </attribute>
            </user>
        </section>
    </node>
</root>

Here is the expected output:

<root>
    <node id="a">
        <section id="a_1" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>                    
            </item>               
        </section>

        <section id="a_2" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>

            </item>
        </section>
    </node>

    <node id="b">
        <section id="b_1" method="create">
            <user id="b_1a" method="x">
                <attribute>
                    <origin>us</origin>
                </attribute>

            </user>

            <user id="b_1b">
                <attribute>a</attribute>
            </user>
        </section>

        <section id="b_2">
            <user id="b_1a" method="x">
                <attribute>
                    <name>John</name>
                    <origin>us</origin>
                </attribute>
            </user>
        </section>
    </node>
</root>

Note: the duplicate means all the child/children is having the same value, the node can have 1 or more children as long as it is the same parent (id and method are the same) and we can assume that it always in the same section (id and method are the same).

is this possible to be done? please enlighten me

Thanks very much.

cheers, John

回答1:

I. This XSLT 1.0 transformation:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
     <xsl:strip-space elements="*"/>

     <xsl:key name="kElemWithAttribs" match="*[@id and @method]"
      use="concat(generate-id(..), '+', name(), '+', @id, '+', @method)"/>

     <xsl:template match="node()|@*">
         <xsl:copy>
           <xsl:apply-templates select="node()|@*"/>
         </xsl:copy>
     </xsl:template>

     <xsl:template match=
      "*[@id and @method
        and
         not(generate-id()
            =
             generate-id(key('kElemWithAttribs',
                             concat(generate-id(..),
                             '+',name(), '+', @id, '+', @method)
                             )[1]
                        )
             )
         ]"/>
</xsl:stylesheet>

when applied to the provided source XML document:

<root>
    <node id="a">
        <section id="a_1" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>
            </item>
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>
            </item>
        </section>
        <section id="a_2" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>
            </item>
        </section>
        <section id="a_2" method="run">
            <item id="0" method="a">
                <attribute>
                    <color>Red</color>
                    <status>1</status>
                    <condition>good</condition>
                </attribute>
            </item>
        </section>
    </node>
    <node id="b">
        <section id="b_1" method="create">
            <user id="b_1a" method="x">
                <attribute>
                    <origin>us</origin>
                </attribute>
            </user>
            <user id="b_1a" method="x">
                <attribute>
                    <origin>us</origin>
                </attribute>
            </user>
            <user id="b_1b">
                <attribute>a</attribute>
            </user>
        </section>
        <section id="b_2">
            <user id="b_1a" method="x">
                <attribute>
                    <name>John</name>
                    <origin>us</origin>
                </attribute>
            </user>
        </section>
    </node>
</root>

produces the wanted, correct result:

<root>
   <node id="a">
      <section id="a_1" method="run">
         <item id="0" method="a">
            <attribute>
               <color>Red</color>
               <status>1</status>
               <condition>good</condition>
            </attribute>
         </item>
      </section>
      <section id="a_2" method="run">
         <item id="0" method="a">
            <attribute>
               <color>Red</color>
               <status>1</status>
               <condition>good</condition>
            </attribute>
         </item>
      </section>
   </node>
   <node id="b">
      <section id="b_1" method="create">
         <user id="b_1a" method="x">
            <attribute>
               <origin>us</origin>
            </attribute>
         </user>
         <user id="b_1b">
            <attribute>a</attribute>
         </user>
      </section>
      <section id="b_2">
         <user id="b_1a" method="x">
            <attribute>
               <name>John</name>
               <origin>us</origin>
            </attribute>
         </user>
      </section>
   </node>
</root>

Explanation: Proper use of the Muenchian method for grouping, using a composite key:

  1. The identity rule copies every node "as-is".

  2. The xsl:key definition associates groups of elements with a string key-value. Any group so defined consists of all elements that have both an id and a method attributes and that (all in the group) have the same parent, the same name, the same string value of the id attribute and the same string value of the method attribute.

  3. There is a single template overriding the identity template. It matches any elements that have both an id and a method attributes and are not the first (in document order) element in their respective group). Because this template has no body, all such matched elements are not processed at all and aren't copied to the output (we could say they are "deleted").

  4. Because of 3. above, only elements that are the first element of their group aren't matched by the overriding template. Thus these elements are matched by the identity template and copied to the output -- exactly as required.


II. XSLT 2.0 Solution:

<xsl:stylesheet version="2.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output omit-xml-declaration="yes" indent="yes"/>

     <xsl:template match="node()|@*">
         <xsl:copy>
           <xsl:apply-templates select="node()|@*"/>
         </xsl:copy>
     </xsl:template>

     <xsl:template match="*[@id]">
      <xsl:copy>
        <xsl:apply-templates select="@*"/>

        <xsl:for-each-group select="*" group-by=
        "concat(generate-id(..), '+', name(), '+', @id, '+', @method)">
          <xsl:apply-templates select="."/>
        </xsl:for-each-group>
      </xsl:copy>
     </xsl:template>
</xsl:stylesheet>

Explanation: Proper use of xsl:for-each-group with the group-by attribute.



标签: xml xslt