This code:
import json
s = '{ "key1": "value1", "key2": "value2", }'
json.loads(s)
produces this error in Python 2:
ValueError: Expecting property name: line 1 column 16 (char 15)
Similar result in Python 3:
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 16 (char 15)
If I remove that trailing comma (after "value2"
), I get no error. But my code will process many different JSONs, so I can't do it manually. Is it possible to setup the parser to ignore such last commas?
JSON specification doesn't allow trailing comma. The parser is throwing since it encounters invalid syntax token.
You might be interested in using a different parser for those files, eg. a parser built for JSON5 spec which allows such syntax.
It could be that this data stream is JSON5, in which case there's a parser for that: https://pypi.org/project/json5/
This situation can be alleviated by a regex substitution that looks for ", }
, and replaces it with " }
, allowing for any amount of whitespace between the quotes, comma and close-curly.
>>> import re
>>> s = '{ "key1": "value1", "key2": "value2", }'
>>> re.sub(r"\"\s*,\s*\}", "\" }", s)
'{ "key1": "value1", "key2": "value2" }'
Giving:
>>> import json
>>> s2 = re.sub(r"\"\s*,\s*\}", "\" }", s)
>>> json.loads(s2)
{'key1': 'value1', 'key2': 'value2'}
EDIT: as commented, this is not a good practice unless you are confident your JSON data contains only simple words, and this change is not corrupting the data-stream further. As I commented on the OP, the best course of action is to repair the up-stream data source. But sometimes that's not possible.
That's because an extra ,
is invalid according to JSON standard.
An object is an unordered set of name/value pairs. An object begins
with { (left brace) and ends with } (right brace). Each name is
followed by : (colon) and the name/value pairs are separated by ,
(comma).
If you really need this, you could wrap python's json parser with jsoncomment. But I would try to fix JSON in the origin.
I suspect it doesn't parse because "it's not json", but you could pre-process strings, using regular expression to replace , }
with }
and , ]
with ]
How about use the following regex?
s = re.sub(r",\s*}", "}", s)