Python decode nested JSON in JSON

I'm dealing with an API that unfortunately is returning malformed (or "weirdly formed," rather -- thanks @fjarri) JSON, but on the positive side I think it may be an opportunity for me to learn something about recursion as well as JSON. It's for an app I use to log my workouts, I'm trying to make a backup script.

I can received the JSON fine, but even after requests.get(api_url).json() (or json.loads(requests.get(api_url).text)), one of the values is still a JSON encoded string. Luckily, I can just json.loads() the string and it properly decodes to a dict. The specific key is predictable: timezone_id, whereas its value varies (because data has been logged in multiple timezones). For example, after decoding, it might be: dumped to file as "timezone_id": {\"name\":\"America/Denver\",\"seconds\":\"-21600\"}", or loaded into Python as 'timezone_id': '{"name":"America/Denver","seconds":"-21600"}'

The problem is that I'm using this API to retrieve a fair amount of data, which has several layers of dicts and lists, and the double encoded timezone_ids occur at multiple levels.

Here's my work so far with some example data, but it seems like I'm pretty far off base.

#! /usr/bin/env python3

import json
from pprint import pprint

my_input = r"""{
    "hasMore": false,
    "checkins": [
        {
            "timestamp": 1353193745000,
            "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
            "privacy_groups": [
                "private"
            ],
            "meta": {
                "client_version": "3.0",
                "uuid": "fake_UUID"
            },
            "client_id": "fake_client_id",
            "workout_name": "Workout (Nov 17, 2012)",
            "fitness_workout_json": {
                "exercise_logs": [
                    {
                        "timestamp": 1353195716000,
                        "type": "exercise_log",
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    },
                    {
                        "timestamp": 1353195340000,
                        "type": "exercise_log",
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    }
                ]
            },
            "workout_uuid": ""
        },
        {
            "timestamp": 1354485615000,
            "user_id": "fake_ID",
            "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
            "privacy_groups": [
                "private"
            ],
            "meta": {
                "uuid": "fake_UUID"
            },
            "created": 1372023457376,
            "workout_name": "Workout (Dec 02, 2012)",
            "fitness_workout_json": {
                "exercise_logs": [
                    {
                        "timestamp": 1354485615000,
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    },
                    {
                        "timestamp": 1354485584000,
                        "timezone_id": "{\"name\":\"America/Denver\",\"seconds\":\"-21600\"}",
                        "workout_log_uuid": "fake_UUID"
                    }
                ]
            },
            "workout_uuid": ""
        }]}"""

def recurse(obj):
    if isinstance(obj, list):
        for item in obj:
            return recurse(item)
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, str):
                try:
                    v = json.loads(v)
                except ValueError:
                    pass
                obj.update({k: v})
            elif isinstance(v, (dict, list)):
                return recurse(v)

pprint(json.loads(my_input, object_hook=recurse))

Any suggestions for a good way to json.loads() all those double-encoded values without changing the rest of the object? Many thanks in advance!

This post seems to be a good reference: Modifying Deeply-Nested Structures

Edit: This was flagged as a possible duplicate of this question -- I think its fairly different, as I've already demonstrated that using json.loads() was not working. The solution ended up requiring an object_hook, which I've never had to use when decoding json and is not addressed in the prior question.

标签： python json recursion

2条回答

ゆ、 Hurt°

2楼-- · 2020-05-29 06:20

So, the object_hook in the json loader is going to be called each time the json loader is finished constructing a dictionary. That is, the first thing it is called on is the inner-most dictionary, working outwards.

The dictionary that the object_hook callback is given is replaced by what that function returns.

So, you don't need to recurse yourself. The loader is giving you access to the inner-most things first by its nature.

I think this will work for you:

def hook(obj):
    value = obj.get("timezone_id")
    # this is python 3 specific; I would check isinstance against 
    # basestring in python 2
    if value and isinstance(value, str):
        obj["timezone_id"] = json.loads(value, object_hook=hook)
    return obj
data = json.loads(my_input, object_hook=hook)

It seems to have the effect I think you're looking for when I test it.

I probably wouldn't try to decode every string value -- I would strategically just call it where you expect there to be a json object double encoding to exist. If you try to decode every string, you might accidentally decode something that is supposed to be a string (like the string "12345" when that is intended to be a string returned by the API).

Also, your existing function is more complicated than it needs to be, might work as-is if you always returned obj (whether you update its contents or not).

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2020-05-29 06:37

Your main issue is that your object_hook function should not be recursing. json.loads() takes care of the recursing itself and calls your function every time it finds a dictionary (aka obj will always be a dictionary). So instead you just want to modify the problematic keys and return the dict -- this should do what you are looking for:

def flatten_hook(obj):
    for key, value in obj.iteritems():
        if isinstance(value, basestring):
            try:
                obj[key] = json.loads(value, object_hook=flatten_hook)
            except ValueError:
                pass
    return obj

pprint(json.loads(my_input, object_hook=flatten_hook))

However, if you know the problematic (double-encoded) entry always take on a specific form (e.g. key == 'timezone_id') it is probably safer to just call json.loads() on those keys only, as Matt Anderson suggests in his answer.

0人赞添加讨论(0) 举报

Python decode nested JSON in JSON

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间