How to remove a node inside an iterator in python

2020-03-26 06:15发布

问题:

How to remove the current node, while iterating through all nodes from root by getiterator() function?

import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()

for node in root.getiterator():
     #if some condition:
        #remove(node)

回答1:

You can't remove nodes without knowing the parent, but the xml.etree package doesn't give you any way to access a parent from a given node.

The only way around this is matching the parent node instead:

for node in root.iter():
    if some_condition_matches_parent:
        for child in list(node.iter()):
            if some_condition_matches_child:
                node.remove(child)

If you switch to the lxml library (which implements the same API, but with additional enhancements), you can retrieve the parent node from any given node:

node.getparent().remove(node)

Note, while the pure-Python implementation of Element.getiterator() returns a list object, in the C implementation of the ElementTree module (a separate import on Python 2, transparently imported on Python 3 if available) the getiterator() method returns a live generator which requires a copy to be made.

On top of that, the Element.getiterator() method has been deprecated in Python 3.2 and will be removed altogether in Python 3.9. I replaced its use with node.iter() in the outer loop, and list(node.iter()) in the inner.