XSLT to deep sort any generic XML on element names

2019-07-13 18:14发布

问题:

I was wondering if it's possible to deep sort an XML (with attributes) on element names without knowing the structure of the XML or the element names. The sorting should be only based on XML Elements and not the attributes. Thanks

Example XML:

 <Customer>
    <CustomerID>ALFKI</CustomerID>
    <Order>
      <OrderID>10692</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
    </Order>
    <CompanyName>Alfreds Futterkiste</CompanyName>
  </Customer>

Result Expected:

 <Customer>
    <CompanyName>Alfreds Futterkiste</CompanyName>
    <CustomerID>ALFKI</CustomerID>
    <Order>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
      <OrderID>10692</OrderID>
    </Order>
  </Customer>

Update: Actual XML

<NAB>
    <jcr:content>
        <par>
            <color>
                <title>
                    <![CDATA[Rouge sangria]]>
                </title>
                <code>
                    <![CDATA[NAB]]>
                </code>
                <image_url>
                    <![CDATA[/assets/2016/x6/colors/exterior/nab.jpg]]>
                </image_url>
            </color>
        </par>
    </jcr:content>
</NAB>

UPDATE:

I just found out that sorting a specific section of XML (under an element/tag <Handling>) screws up things. Is it possible to modify the XSLT to avoid everything under (all the children of) <Handling>?

回答1:

How about the following stylesheet, which sorts child elements before applying templates to them?

Start with an identity template and add a template that matches elements that themselves have child elements:

<xsl:template match="*[*]">

Copy those elements to the output and apply templates to their content - but before, sort them by their name.

XSLT Stylesheet

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:output method="xml" omit-xml-declaration="no" encoding="UTF-8" indent="yes" />
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*[*]">
        <xsl:copy>
            <xsl:apply-templates>
                <xsl:sort select="name()"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

</xsl:transform>

XML Output

<?xml version="1.0" encoding="UTF-8"?>
<Customer>
   <CompanyName>Alfreds Futterkiste</CompanyName>
   <CustomerID>ALFKI</CustomerID>
   <Order>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
      <OrderID>10692</OrderID>
   </Order>
</Customer>

Please note: This solution might not give the correct result in all contexts, for instance when there are namespaces in the document. If there are prefixed element names in your document, you would have to sort by local-name():

<xsl:sort select="local-name()"/>

Try this solution online here and an example with a namespace here.


EDIT: So far, my solution did not work for attributes of elements whose child elements are sorted. Use this suggestion by Daniel Haley to keep attributes if there are any:

<xsl:apply-templates select="@*|node()">
   <xsl:sort select="self::*/local-name()"/>
</xsl:apply-templates>

with an explicit select attribute on xsl:apply-templates.


EDIT 2

I just figured out that sorting out a certain part of XML under a specific element screws up things. Will it be possible to modify the above code to omit sorting under <Handling> element tag?

Change the second template to

<xsl:template match="*[* and not(self::Handling or ancestor::Handling)]">
    <xsl:copy>
        <xsl:apply-templates>
            <xsl:sort select="name()"/>
        </xsl:apply-templates>
    </xsl:copy>
 </xsl:template>

to avoid sorting the children of the Handling element. It does not sort the children of the descendant elements of Handling either. If that's not what you intended to do, change the template match to

<xsl:template match="*[* and not(self::Handling)]">

to only avoid sorting the immediate children of Handling.

Please note: if Handling is in a namespace, the approach above will not work.