how to recursively iterate over XML tags in Python

2020-02-08 06:12发布

I am trying to iterate over all nodes in a tree using ElementTree.

I do something like:

  tree = ET.parse("/tmp/test.xml")

  root = tree.getroot()

  for child in root:
       ### do something with child

The problem is that child is an Element object and not ElementTree object, so I can't further look into it and recurse to iterate over its elements. Is there a way to iterate differently over "root" so that it iterates over the top level nodes in the tree (immediate children) and return the same class as root itself?

标签: python xml
4条回答
戒情不戒烟
2楼-- · 2020-02-08 06:40

To iterate over all nodes, use the iter method on the ElementTree, not the root Element.

The root is an Element, just like the other elements in the tree and only really has context of its own attributes and children. The ElementTree has the context for all Elements.

For example, given this xml

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

You can do the following

>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
...     print elem
... 
<Element 'data' at 0x10b2d7b50>
<Element 'country' at 0x10b2d7b90>
<Element 'rank' at 0x10b2d7bd0>
<Element 'year' at 0x10b2d7c50>
<Element 'gdppc' at 0x10b2d7d10>
<Element 'neighbor' at 0x10b2d7e90>
<Element 'neighbor' at 0x10b2d7ed0>
<Element 'country' at 0x10b2d7f10>
<Element 'rank' at 0x10b2d7f50>
<Element 'year' at 0x10b2d7f90>
<Element 'gdppc' at 0x10b2d7fd0>
<Element 'neighbor' at 0x10b2db050>
<Element 'country' at 0x10b2db090>
<Element 'rank' at 0x10b2db0d0>
<Element 'year' at 0x10b2db110>
<Element 'gdppc' at 0x10b2db150>
<Element 'neighbor' at 0x10b2db190>
<Element 'neighbor' at 0x10b2db1d0>
查看更多
别忘想泡老子
3楼-- · 2020-02-08 06:43

you can also access specific elements like this:

country= tree.findall('.//country')

then loop over range(len(country)) and access

查看更多
一纸荒年 Trace。
4楼-- · 2020-02-08 06:44

Adding to Robert Christie's answer it is possible to iterate over all nodes using fromstring() by converting the Element to an ElementTree:

import xml.etree.ElementTree as ET

e = ET.ElementTree(ET.fromstring(xml_string))
for elt in e.iter():
    print "%s: '%s'" % (elt.tag, elt.text)
查看更多
Anthone
5楼-- · 2020-02-08 06:44

In addition to Robert Christie's accepted answer, printing the values and tags separately is very easy:

tree = ET.parse('test.xml')
for elem in tree.iter():
    print(elem.tag, elem.text)
查看更多
登录 后发表回答