ParseError: not well-formed (invalid token) using-第2页回答

I receive xml strings from an external source that can contains unsanitized user contributed content.

The following xml string gave a ParseError in cElementTree:

>>> print repr(s)
'<Comment>dddddddd\x08\x08\x08\x08\x08\x08_____</Comment>'
>>> import xml.etree.cElementTree as ET
>>> ET.XML(s)

Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    ET.XML(s)
  File "<string>", line 106, in XML
ParseError: not well-formed (invalid token): line 1, column 17

Is there a way to make cElementTree not complain?

标签： python parsing elementtree

9条回答

Ridiculous、

2楼-- · 2019-01-26 05:11

This is most probably an encoding error. For example I had an xml file encoded in UTF-8-BOM (checked from the Notepad++ Encoding menu) and got similar error message.

The workaround (Python 3.6)

import io
from xml.etree import ElementTree as ET

with io.open(file, 'r', encoding='utf-8-sig') as f:
    contents = f.read()
    tree = ET.fromstring(contents)

Check the encoding of your xml file. If it is using different encoding, change the 'utf-8-sig' accordingly.

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2019-01-26 05:14

What helped me with that error was Juan's answer - https://stackoverflow.com/a/20204635/4433222 But wasn't enough - after struggling I found out that an XML file needs to be saved with UTF-8 without BOM encoding.

The solution wasn't working for "normal" UTF-8.

0人赞添加讨论(0) 举报

Root（大扎）

4楼-- · 2019-01-26 05:20

I have been in stuck with similar problem. Finally figured out the what was the root cause in my particular case. If you read the data from multiple XML files that lie in same folder you will parse also .DS_Store file. Before parsing add this condition

for file in files:
    if file.endswith('.xml'):
       run_your_code...

This trick helped me as well

0人赞添加讨论(0) 举报

上一页 1 2

ParseError: not well-formed (invalid token) using

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间