I have an XML file like this:
<hierachy>
<att>
<Order>1</Order>
<attval>Data</attval>
<children>
<att>
<Order>1</Order>
<attval>Studyval</attval>
</att>
<att>
<Order>2</Order>
<attval>Site</attval>
</att>
</children>
</att>
<att>
<Order>2</Order>
<attval>Info</attval>
<children>
<att>
<Order>1</Order>
<attval>age</attval>
</att>
<att>
<Order>2</Order>
<attval>gender</attval>
</att>
</children>
</att>
</hierachy>
I'm trying to convert it to a CSV file like this:
Data,Studyval
Date,Site
Info,age
Info,gender
My problem is, both the parent and child names are the same- 'att' and 'attval'. How do I tell Python to distinguish between the both and give me the output?
I tried this:
import xml.etree.cElementTree as ET
tree = ET.parse('input.xml')
rebase = tree.getroot()
list = []
for att in rebase.findall('att'):
name = att.find('attval').text
for each_att in att.findall('attval'):
try:
val = att.find('attval').text
print name, val
except AttributeError:
print name
and it printed the same things twice.
Do not use the
findall
function, as it will look for att tags in the whole tree. Just iterate the tree in order from top to bottom and grab the relevant elements in them.Which gives: