Python can't parse JSON with extra trailing co

2020-04-14 07:20发布

This code:

import json
s = '{ "key1": "value1", "key2": "value2", }'
json.loads(s)

produces this error in Python 2:

ValueError: Expecting property name: line 1 column 16 (char 15)

Similar result in Python 3:

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 16 (char 15)

If I remove that trailing comma (after "value2"), I get no error. But my code will process many different JSONs, so I can't do it manually. Is it possible to setup the parser to ignore such last commas?

5条回答
闹够了就滚
2楼-- · 2020-04-14 07:45

I suspect it doesn't parse because "it's not json", but you could pre-process strings, using regular expression to replace , } with } and , ] with ]

查看更多
迷人小祖宗
3楼-- · 2020-04-14 07:50

That's because an extra , is invalid according to JSON standard.

An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).

enter image description here

If you really need this, you could wrap python's json parser with jsoncomment. But I would try to fix JSON in the origin.

查看更多
来,给爷笑一个
4楼-- · 2020-04-14 07:56

JSON specification doesn't allow trailing comma. The parser is throwing since it encounters invalid syntax token.

You might be interested in using a different parser for those files, eg. a parser built for JSON5 spec which allows such syntax.

查看更多
我想做一个坏孩纸
5楼-- · 2020-04-14 07:59

How about use the following regex?

s = re.sub(r",\s*}", "}", s)
查看更多
Melony?
6楼-- · 2020-04-14 08:02

It could be that this data stream is JSON5, in which case there's a parser for that: https://pypi.org/project/json5/

This situation can be alleviated by a regex substitution that looks for ", }, and replaces it with " }, allowing for any amount of whitespace between the quotes, comma and close-curly.

>>> import re
>>> s = '{ "key1": "value1", "key2": "value2", }'
>>> re.sub(r"\"\s*,\s*\}", "\" }", s)
'{ "key1": "value1", "key2": "value2" }'

Giving:

>>> import json
>>> s2 = re.sub(r"\"\s*,\s*\}", "\" }", s)
>>> json.loads(s2)
{'key1': 'value1', 'key2': 'value2'}

EDIT: as commented, this is not a good practice unless you are confident your JSON data contains only simple words, and this change is not corrupting the data-stream further. As I commented on the OP, the best course of action is to repair the up-stream data source. But sometimes that's not possible.

查看更多
登录 后发表回答