无法解析与蟒蛇简单的JSON(Can't parse simple json with py

我有一个非常简单的JSON，我不能simplejson模块解析。再生产：

import simplejson as json
json.loads(r'{"translatedatt1":"Vari\351es"}')

结果：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.5/simplejson/__init__.py", line 307, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 335, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/pymodules/python2.5/simplejson/decoder.py", line 351, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid \escape: line 1 column 23 (char 23)

任何人有一个想法，什么是错的，如何正确以上解析JSON？

这是编码有串：Variées

PS我使用Python 2.5

非常感谢！

Answer 1:

这将是完全正确的; Vari\351es包含无效逃逸，JSON标准不允许一个\ ，然后只是数字。

不管产生的代码应该是固定的。如果这是不可能的，你需要使用正则表达式来要么删除那些逃逸，或凭有效转义替换它们。

如果我们解释351号八进制数，这将指向Unicode代码点U + 00E9的é字符（急性拉丁小写字母E）。你可以“修复”您的JSON输入有：

import re

invalid_escape = re.compile(r'\\[0-7]{1,6}')  # up to 6 digits for codepoints up to FFFF

def replace_with_codepoint(match):
    return unichr(int(match.group(0)[1:], 8))


def repair(brokenjson):
    return invalid_escape.sub(replace_with_codepoint, brokenjson)

使用repair()你的例子可以加载：

>>> json.loads(repair(r'{"translatedatt1":"Vari\351es"}'))
{u'translatedatt1': u'Vari\xe9es'}

您可能需要调整码点的解释; 我选择八进制（因为Variées是一个实际的词），但你需要测试这更与其他码点。

Answer 2:

你可能不打算使用原始字符串，而是一个Unicode字符串？

>>> import simplejson as json
>>> json.loads(u'{"translatedatt1":"Vari\351es"}')
{u'translatedatt1': u'Vari\xe9es'}

如果你想引用你需要使用JSON字符串中的数据\uNNNN ：

>>> json.loads(r'{"translatedatt1":"Vari\u351es"}')
{'translatedatt1': u'Vari\u351es'}

请注意，所产生的字典在这种情况下略有不同。当解析一个unicode字符串simplejson使用unicode strings的钥匙。否则，它使用byte string钥匙。

事实上，如果你使用JSON数据确实\351e比它只是打破，没有有效的JSON。

文章来源: Can't parse simple json with python