Serializing a nested namedtuple into JSON with Pyt

2019-04-21 15:26发布

问题:

I have a problem similar to CalvinKrishy's problem Samplebias's solution is not working with the data I have.

I am using Python 2.7.

Here's the data:

Namedtuple

>>> a_t = namedtuple('a','f1 words')
>>> word_t = namedtuple('word','f2 value')
>>> w1 = word_t(f2=[0,1,2], value='abc')
>>> w2 = word_t(f2=[3,4], value='def')
>>> a1 = a_t(f1=[0,1,2,3,4],words=[w1, w2])
>>> a1
a(f1=[0, 1, 2, 3, 4], words=[word(f2=[0, 1, 2], value='abc'), word(f2=[3, 4], value='def')])

Dict

>>> w3 = {}
>>> w3['f2'] = [0,1,2]
>>> w3['value'] = 'abc'
>>> w4 = {}
>>> w4['f2'] = [3,4]
>>> w4['value'] = 'def'
>>> a2 = {}
>>> a2['f1'] = [0, 1, 2, 3, 4]
>>> a2['words'] = [w3,w4]
>>> a2
{'f1': [0, 1, 2, 3, 4], 'words': [{'f2': [0, 1, 2], 'value': 'abc'}, {'f2': [3, 4], 'value': 'def'}]}

As you can see that both a1 and a2 are same except that one is namedtuple and other is dict.

But the json.dumps is different:

>>> json.dumps(a1._asdict())
'{"f1": [0, 1, 2, 3, 4], "words": [[[0, 1, 2], "abc"], [[3, 4], "def"]]}'
>>> json.dumps(a2)
'{"f1": [0, 1, 2, 3, 4], "words": [{"f2": [0, 1, 2], "value": "abc"}, {"f2": [3, 4], "value": "def"}]}'

I want to have json format of a1 exactly like its doing for a2.

回答1:

The problem is in the use of namedtuple._asdict, not json.dumps. If you look at the code with namedtuple(..., verbose=True) you will see this:

def _asdict(self):
    'Return a new OrderedDict which maps field names to their values'
    return OrderedDict(zip(self._fields, self))

Only the top level is actually changed into an OrderedDict, all contained elements are left untouched. This means that nested namedtuples still are tuple subclasses and get (correctly) serialized and such.

If the call to a specific conversion function is acceptable for you (like the call to _asdict), you can write your own one.

def namedtuple_asdict(obj):
  if hasattr(obj, "_asdict"): # detect namedtuple
    return OrderedDict(zip(obj._fields, (namedtuple_asdict(item) for item in obj)))
  elif isinstance(obj, basestring): # iterables - strings
     return obj
  elif hasattr(obj, "keys"): # iterables - mapping
     return OrderedDict(zip(obj.keys(), (namedtuple_asdict(item) for item in obj.values())))
  elif hasattr(obj, "__iter__"): # iterables - sequence
     return type(obj)((namedtuple_asdict(item) for item in obj))
  else: # non-iterable cannot contain namedtuples
    return obj

json.dumps(namedtuple_asdict(a1))
# prints '{"f1": [0, 1, 2, 3, 4], "words": [{"f2": [0, 1, 2], "value": "abc"}, {"f2": [3, 4], "value": "def"}]}'

As you can see, the biggest problem is having nested structures which are not namedtuples but could contain them.



回答2:

Here's the version I went with, adapted from MisterMiyagi's. I used isinstance with collections.abc instead of hasattr, and I incuded a _type key in the resulting dict with the name of the namedtuple class.

import collections.abc

def _nt_to_dict(obj):
    recurse = lambda x: map(_nt_to_dict, x)
    obj_is = lambda x: isinstance(obj, x)
    if obj_is(tuple) and hasattr(obj, '_fields'):  # namedtuple
        fields = zip(obj._fields, recurse(obj))
        class_name = obj.__class__.__name__
        return dict(fields, **{'_type': class_name})
    elif obj_is(collections.abc.Mapping):
        return type(obj)(zip(obj.keys(), recurse(obj.values())))
    elif obj_is(collections.abc.Iterable) and not obj_is(str):
        return type(obj)(recurse(obj))
    else:
        return obj