Python - How to determine hierarchy level of parse

2019-06-22 12:58发布

I am trying to parse elements with certain tag from XML file with Python and generate output excel document, which would contain elements and also preserve their hierarchy.

My problem is that I cannot figure out how deeply nested each element (over which parser iterates) is.

XML sample extract (3 elements, they can be nested arbitrarily within themselves):

<A>
   <B>
      <C>
      </C>
   </B>
</A>
<B>
    <A>
    </A>
</B>

Following code, using ElementTree, worked well to iterate over elements. But I think ElementTree is not capable determining how deeply each element is nested. See below:

import xml.etree.ElementTree as ET

root = ET.parse('XML_file.xml')
tree = root.getroot()
for element in tree.iter():
    if element.tag in ("A","B","C"):
        print(element.tag)

This will get me the list of elements A,B,C in right order. But I need to print them out with information of their level,

So not only:

A
B
C
B
A

But something like:

A
--B
----C
B
--A

To be able to do this, I need to get the level of each element. Is there any suitable parser for python which can easily do this? I would imagine something like "element.hierarchyLevel" which would return some Integer index...

2条回答
Ridiculous、
2楼-- · 2019-06-22 13:18

Try using a recursive function, that keeps track of your "level".

import xml.etree.ElementTree as ET

def perf_func(elem, func, level=0):
    func(elem,level)
    for child in elem.getchildren():
        perf_func(child, func, level+1)

def print_level(elem,level):
    print '-'*level+elem.tag

root = ET.parse('XML_file.xml')
perf_func(root.getroot(), print_level)
查看更多
Rolldiameter
3楼-- · 2019-06-22 13:22

You could use xml.sax.saxhandler:

import xml.sax as sax
import xml.sax.handler as saxhandler

class TreeBuilder(saxhandler.ContentHandler):
    # http://docs.python.org/library/xml.sax.handler.html#contenthandler-objects
    def __init__(self):
        self.level = 0
    def startElement(self, name, attrs):
        print('--'*self.level + name)
        self.level += 1
    def endElement(self, name):
        self.level -= 1

builder = TreeBuilder()
src = '''\
<root>
<A>
   <B>
      <C>
      </C>
   </B>
</A>
<B>
    <A>
    </A>
</B>
</root>
'''
sax.parseString(src, builder)

yields

root
--A
----B
------C
--B
----A
查看更多
登录 后发表回答