I have this xml inputfile:
<?xml version="1.0"?>
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
<second>
<third-num>2</third-num>
<third-def>object002</third-def>
<third-len>426</third-len>
</second>
<second>
<third-num>3</third-num>
<third-def>object003</third-def>
<third-len>998</third-len>
</second>
</First>
</zero>
My goal is to remove any second level for which <third-def>
that is not a value. To do that, I wrote this code:
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
inputfile='inputfile.xml'
tree = ET.parse(inputfile)
root = tree.getroot()
elem = tree.find('First')
for elem2 in tree.iter(tag='second'):
if elem2.find('third-def').text == 'object001':
pass
else:
elem.remove(elem2)
#elem2.clear()
My problem is elem.remove(elem2)
. It skips every other second level. Here is the output of this code:
<?xml version="1.0" ?>
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
<second>
<third-num>3</third-num>
<third-def>object003</third-def>
<third-len>998</third-len>
</second>
</First>
</zero>
Now if I un-comment the elem2.clear()
line, the script works perfectly, but the output is less nice as it keeps all the removed second levels:
<?xml version="1.0" ?>
<zero>
<First>
<second>
<third-num>1</third-num>
<third-def>object001</third-def>
<third-len>458</third-len>
</second>
<second/>
<second/>
</First>
</zero>
Does anybody has a clue why my element.remove()
statement is wrong?
You are looping over the live tree:
which you then change while iterating. The 'counter' of the iteration won't be told about the changed number of elements, so when looking at element 0 and removing that element, the iterator then moves on to element number 1. But what was element number 1 is now element number 0.
Capture a list of all the elements first, then loop over that:
.findall()
returns a list of results, which doesn't update as you alter the tree.Now the iteration won't skip the last element:
This phenomenon is not limited to ElementTree trees; see Loop "Forgets" to Remove Some Items