I'm trying to parse the code of JavaScript objects that hold huge JavaScript arrays and convert it to a Python dictionary with lists.
At the moment I'm using PyYaml, but that didn't work directly, as it can't handle consecutive commas (e.g. it breaks on '[,,,0,]' with: expected the node content, but found ','). So I substituted these out, but this is all very slow. I'm wondering if any of you know of a better and faster way to do this. JSON decode doesn't work as JavaScript code isn't JSON valid either.
This is the code I'm using, explained above, with js_obj as example:
js_obj = "{index: '37',data: [, 1, 2, 3,,,]}"
def repl(match):
content = re.sub(" ", "",match.group(0))
length = len(content) - 1
result = ''
if content[0] == '[':
result = '[""'
length -= 1
after = ','
if content[-1] == ']':
length -= 1
after += '""]'
return result + (',""' * length) + after
py_dict = yaml.load(re.sub('\[? *(, *)+\]?', repl, js_obj))
You probably should write data from JavaScript using JSON, and then read it into Python in JSON. YAML is OK, but I tend to prefer JSON over YAML; JSON is more consistent.
If you must parse the JavaScript, you might want to look into pyparsing or similar.