I'm using Python's xml.etree.ElementTree
to do some XML parsing on a file. However, I get this error mid-way through the document:
xml.parsers.expat.ExpatError: not well-formed (invalid token): line X, column Y
So I go to line X, column Y in vim and I see an ampersand (&) with red background highlighting. What does this mean?
Also the two characters preceding it are >>
, so maybe there's something special about >>&
?
Anyone know how to fix this?
The & is a special character in XML, used for character entities. If your XML has & sitting there by itself, not as part of an entity like &
or ѐ
or the like, then the XML is invalid.
You can use the escape function found in the xml module
from xml.sax.saxutils import escape
my_string = "Some string with an &"
# If the string contains &, <, or > they will be converted.
print(escape(my_string))
# Above will return: Some string with an &
Reference: Escaping strings for use in XML
I solve it by using yattag
instead
from yattag import indent
print indent(xml_string.encode('utf-8'))