Controlling Yaml Serialization Order in Python

2019-01-23 05:18发布

问题:

How do you control how the order in which PyYaml outputs key/value pairs when serializing a Python dictionary?

I'm using Yaml as a simple serialization format in a Python script. My Yaml serialized objects represent a sort of "document", so for maximum user-friendliness, I'd like my object's "name" field to appear first in the file. Of course, since the value returned by my object's __getstate__ is a dictionary, and Python dictionaries are unordered, the "name" field will be serialized to a random location in the output.

e.g.

>>> import yaml
>>> class Document(object):
...     def __init__(self, name):
...         self.name = name
...         self.otherstuff = 'blah'
...     def __getstate__(self):
...         return self.__dict__.copy()
... 
>>> doc = Document('obj-20111227')
>>> print yaml.dump(doc, indent=4)
!!python/object:__main__.Document
otherstuff: blah
name: obj-20111227

回答1:

Took me a few hours of digging through PyYAML docs and tickets, but I eventually discovered this comment that lays out some proof-of-concept code for serializing an OrderedDict as a normal YAML map (but maintaining the order).

e.g. applied to my original code, the solution looks something like:

>>> import yaml
>>> from collections import OrderedDict
>>> def dump_anydict_as_map(anydict):
...     yaml.add_representer(anydict, _represent_dictorder)
... 
>>> def _represent_dictorder( self, data):
...     if isinstance(data, Document):
...         return self.represent_mapping('tag:yaml.org,2002:map', data.__getstate__().items())
...     else:
...         return self.represent_mapping('tag:yaml.org,2002:map', data.items())
... 
>>> class Document(object):
...     def __init__(self, name):
...         self.name = name
...         self.otherstuff = 'blah'
...     def __getstate__(self):
...         d = OrderedDict()
...         d['name'] = self.name
...         d['otherstuff'] = self.otherstuff
...         return d
... 
>>> dump_anydict_as_map(Document)
>>> doc = Document('obj-20111227')
>>> print yaml.dump(doc, indent=4)
!!python/object:__main__.Document
name: obj-20111227
otherstuff: blah


回答2:

Cerin, Thanks a lot for your answer and it helped me the resolve my issue. But it took me sometime to understand the answer as there was no input dictionary mentioned. So, I'm re-posting @cerin's answer with the input dictionary. Here output is displayed as a separate entries. So, this approach is good for recursively dumping data to a yaml file in a predefined order.

import yaml

input_dict = {"first_key": "fist_value", "second_key": "second_value", "third_key": "third_value"}

from collections import OrderedDict
def dump_anydict_as_map(anydict):
    yaml.add_representer(anydict, _represent_dictorder)

def _represent_dictorder( self, data):
    if isinstance(data, Document):
        return self.represent_mapping('tag:yaml.org,2002:map', data.__getstate__().items())
    else:
        return self.represent_mapping('tag:yaml.org,2002:map', data.items())

class Document(object):
    def __init__(self, name): # no need to preserve the order here
        self.first_key = input_dict["first_key"]
        self.second_key = input_dict["second_key"]
        self.third_key = input_dict["third_key"]
    def __getstate__(self): # this is where order should be defined
        d = OrderedDict()
        d['second_key'] = self.second_key
        d['third_key'] = self.third_key
        d['first_key'] = self.first_key
        return d

dump_anydict_as_map(Document)
doc = Document('obj-20111227')
print(yaml.dump([doc], default_flow_style=False))

Output

- second_key: second_value
  third_key: third_value
  first_key: fist_value


回答3:

The last time I checked, Python's dictionaries weren't ordered. If you really want them to be, I strongly recommend using a list of key/value pairs.

[
    ('key', 'value'),
    ('key2', 'value2')
]

Alternatively, define a list with the keys and put them in the right order.

keys = ['key1', 'name', 'price', 'key2'];
for key in keys:
    print obj[key]


标签: python yaml