I would like to parse JSON-like strings. Their lone difference with normal JSON is the presence of contiguous commas in arrays. When there are two such commas, it implicitly means that null
should be inserted in-between. Example:
JSON-like: ["foo",,,"bar",[1,,3,4]]
Javascript: ["foo",null,null,"bar",[1,null,3,4]]
Decoded (Python): ["foo", None, None, "bar", [1, None, 3, 4]]
The native json.JSONDecoder
class doesn't allow me to change the behavior of the array parsing. I can only modify the parser for objects (dicts), ints, floats, strings (by giving kwargs functions to JSONDecoder()
, please see the doc).
So, does it mean I have to write a JSON parser from scratch? The Python code of json
is available but it's quite a mess. I would prefer to use its internals instead of duplicating its code!
You can do the comma replacement of Lattyware's/przemo_li's answers in one pass by using a lookbehind expression, i.e. "replace all commas that are preceded by just a comma":
Note that this will work for small things where you can assume there aren't consecutive commas in string literals, for example. In general, regular expressions aren't enough to handle this problem, and Taymon's approach of using a real parser is the only fully correct solution.
It's a hackish way of doing it, but one solution is to simply do some string modification on the JSON-ish data to get it in line before parsing it.
Which leaves us with:
We can then do:
Giving us:
Note that it's not as simple as a replace, as the replacement also inserts commas that can need replacing. Given this, you have to loop through until no more replacements can be made. Here I have used a simple regex to do the job.
I've had a look at Taymon recommendation, pyparsing, and I successfully hacked the example provided here to suit my needs. It works well at simulating Javascript
eval()
but fails one situation: trailing commas. There should be a optional trailing comma – see tests below – but I can't find any proper way to implement this.Since what you're trying to parse isn't JSON per se, but rather a different language that's very much like JSON, you may need your own parser.
Fortunately, this isn't as hard as it sounds. You can use a Python parser generator like pyparsing. JSON can be fully specified with a fairly simple context-free grammar (I found one here), so you should be able to modify it to fit your needs.
Small & simple workaround to try out:
Let JSONDecoder(), do the heavy lifting.
(And if converting to string is impractical, update your question with this info!)
For those looking for something quick and dirty to convert general JS objects (to dicts). Some part of the page of one real site gives me some object I'd like to tackle. There are 'new' constructs for dates, and it's in one line, no spaces in between, so two lines suffice:
Then json.loads() worked fine. Your mileage may vary:)