I have a set of XML files that I need to read and format into a single CSV file. In order to read from the XML files, I have used the solution mentioned here.
My code looks like this:
from os import listdir
import xml.etree.cElementTree as et
files = listdir(".../blogs/")
for i in range(len(files)):
# fname = ".../blogs/" + files[i]
f = open(".../blogs/" + files[i], 'r')
contents = f.read()
tree=et.fromstring(contents)
for el in tree.findall('post'):
post = el.text
f.close()
This gives me the error cElementTree.ParseError: undefined entity:
at the line tree=et.fromstring(contents)
. Oddly enough, when I run each of the commands on command line Python (without the for-loop though), it runs perfectly.
In case you want to know the XML structure, it is like this:
<Blog>
<date> some date </date>
<post> some blog post </post>
</Blog>
So what is causing this error, and how come it doesn't run from the Python file, but runs from the command line?
Update: After reading this link I checked files[0]
and found that '&' symbol occurs a few times. I think that might be causing the problem. I used a random file to read when I ran the same commands on command line.