Fastest way to convert JavaScript object/array to

2019-08-05 07:08发布

问题:

I'm trying to parse the code of JavaScript objects that hold huge JavaScript arrays and convert it to a Python dictionary with lists.

At the moment I'm using PyYaml, but that didn't work directly, as it can't handle consecutive commas (e.g. it breaks on '[,,,0,]' with: expected the node content, but found ','). So I substituted these out, but this is all very slow. I'm wondering if any of you know of a better and faster way to do this. JSON decode doesn't work as JavaScript code isn't JSON valid either.

This is the code I'm using, explained above, with js_obj as example:

js_obj = "{index: '37',data: [, 1, 2, 3,,,]}"

def repl(match):
    content = re.sub(" ", "",match.group(0))
    length = len(content) - 1
    result = ''
    if content[0] == '[':
        result = '[""'
        length -= 1

    after = ','
    if content[-1] == ']':
        length -= 1
        after += '""]'

    return result + (',""' * length) + after

py_dict = yaml.load(re.sub('\[? *(, *)+\]?', repl, js_obj))

回答1:

You probably should write data from JavaScript using JSON, and then read it into Python in JSON. YAML is OK, but I tend to prefer JSON over YAML; JSON is more consistent.

If you must parse the JavaScript, you might want to look into pyparsing or similar.