XML parsing in python: expaterror not well-formed

2020-03-10 08:35发布

I'm using Python's xml.etree.ElementTree to do some XML parsing on a file. However, I get this error mid-way through the document:

xml.parsers.expat.ExpatError: not well-formed (invalid token): line X, column Y

So I go to line X, column Y in vim and I see an ampersand (&) with red background highlighting. What does this mean?

Also the two characters preceding it are >>, so maybe there's something special about >>&?

Anyone know how to fix this?

3条回答
三岁会撩人
2楼-- · 2020-03-10 09:23

The & is a special character in XML, used for character entities. If your XML has & sitting there by itself, not as part of an entity like & or ѐ or the like, then the XML is invalid.

查看更多
ゆ 、 Hurt°
3楼-- · 2020-03-10 09:25

You can use the escape function found in the xml module

from xml.sax.saxutils import escape

my_string = "Some string with an &"

# If the string contains &, <, or > they will be converted.
print(escape(my_string))

# Above will return: Some string with an &amp;

Reference: Escaping strings for use in XML

查看更多
smile是对你的礼貌
4楼-- · 2020-03-10 09:29

I solve it by using yattag instead

from yattag import indent
print indent(xml_string.encode('utf-8'))
查看更多
登录 后发表回答