I have an xml file
<temp>
<email id="1" Body="abc"/>
<email id="2" Body="fre"/>
.
.
<email id="998349883487454359203" Body="hi"/>
</temp>
I want to read the xml file for each email tag. That is, at a time I want to read email id=1..extract body from it, the read email id=2...and extract body from it...and so on
I tried to do this using DOM model for XML parsing, since my file size is 100 GB..the approach does not work. I then tried using:
from xml.etree import ElementTree as ET
tree=ET.parse('myfile.xml')
root=ET.parse('myfile.xml').getroot()
for i in root.findall('email/'):
print i.get('Body')
Now once I get the root..I am not getting why is my code not been able to parse.
The code upon using iterparse is throwing the following error:
"UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 437: ordinal not in range(128)"
Can somebody help