可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to load a large file (2GB in size) filled with JSON strings, delimited by newlines. Ex:

{
    "key11": value11,
    "key12": value12,
}
{
    "key21": value21,
    "key22": value22,
}
…

The way I'm importing it now is:

content = open(file_path, "r").read() 
j_content = json.loads("[" + content.replace("}\n{", "},\n{") + "]")

Which seems like a hack (adding commas between each JSON string and also a beginning and ending square bracket to make it a proper list).

Is there a better way to specify the JSON delimiter (newline \n instead of comma ,)?

Also, Python can't seem to properly allocate memory for an object built from 2GB of data, is there a way to construct each JSON object as I'm reading the file line by line? Thanks!

回答1:

Just read each line and construct a json object at this time:

with open(file_path) as f:
    for line in f:
        j_content = json.loads(line)

This way, you load proper complete json object (provided there is no \n in a json value somewhere or in the middle of your json object) and you avoid memory issue as each object is created when needed.

There is also this answer.:

https://stackoverflow.com/a/7795029/671543

回答2:

This will work for the specific file format that you gave. If your format changes, then you'll need to change the way the lines are parsed.

{
    "key11": 11,
    "key12": 12
}
{
    "key21": 21,
    "key22": 22
}

Just read line-by-line, and build the JSON blocks as you go:

with open(args.infile, 'r') as infile:

    # Variable for building our JSON block
    json_block = []

    for line in infile:

        # Add the line to our JSON block
        json_block.append(line)

        # Check whether we closed our JSON block
        if line.startswith('}'):

            # Do something with the JSON dictionary
            json_dict = json.loads(''.join(json_block))
            print(json_dict)

            # Start a new block
            json_block = []

If you are interested in parsing one very large JSON file without saving everything to memory, you should look at using the object_hook or object_pairs_hook callback methods in the json.load API.

回答3:

contents = open(file_path, "r").read() 
data = [json.loads(str(item)) for item in contents.strip().split('\n')]

How to read line-delimited JSON from large file &#

问题:

回答1:

回答2:

回答3:

收藏的人(0)

How to read line-delimited JSON from large file &#

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮