Sorting XML files

2019-08-14 00:04发布

问题:

Is it possible to sort XML files like the following:

<model name="ford">
<driver>Bob</driver>
<driver>Alice</driver>
</model>

<model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>

Which would become

<model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>

<model name="ford">
<driver>Alice</driver>
<driver>Bob</driver>
</model>

That is, the outermost elements are sorted first, then the second outermost, and so on.

They'd need to be sorted by element name first. Can this be done? Or should I use something like BeautifulSoup to spin my own?

回答1:

This is a refinement of Kirill's solution, I think it better reflects the stated requirements, and it avoids the type error XSLT 2.0 will give you if the sort key contains more than one value (but it still works on 1.0).

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" />

  <xsl:template match="*">
    <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates select="*">
      <xsl:sort select="(@name | text())[1]"/>
    </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>


回答2:

Try this XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" />

  <xsl:template match="@* | node()">
    <xsl:copy>
    <xsl:apply-templates select="@* | node()">
      <xsl:sort select="text() | @*"/>
        </xsl:apply-templates>
      </xsl:copy>
    </xsl:template>
</xsl:stylesheet>


回答3:

You can sort nodes by removing them from the parent node, and re-inserting them in the intended order. For example:

def sort_tree(tree):
    """ recursively sorts the given etree in place """
    for child in tree:
        sort_tree(child)

    sorted_children = sorted(tree, key=lambda n: n.text)
    for child in tree:
        tree.remove(child)
    for child in reversed(sorted_children):
        tree.insert(0, child)

tree = etree.fromstring(YOUR_XML)
sort_tree(tree)
print(etree.tostring(tree, pretty_print=True))


回答4:

You don't need to sort the entire xml dom. Instead take the required nodes into a list and sort them. Because we would need the sorted order while processing and not in file, its better done in run time. May be like this, using minidom.

import os, sys
from xml.dom import minidom
document = """\
<root>
<model name="ford">
<driver>Bob</driver>
<driver>Alice</driver>
</model><model name="audi">
<driver>Carly</driver>
<driver>Dean</driver>
</model>
</root>
"""

document = minidom.parseString(document)
elements = document.getElementsByTagName("model")
elements.sort(key=lambda elements:elements.attributes['name'])