I am trying to make a yaml sequence in python that creates a custom python object. The object needs to be constructed with dicts and lists that are deconstructed after __init__
. However, it seems that the construct_mapping function does not construct the entire tree of embedded sequences (lists) and dicts.
Consider the following:
import yaml
class Foo(object):
def __init__(self, s, l=None, d=None):
self.s = s
self.l = l
self.d = d
def foo_constructor(loader, node):
values = loader.construct_mapping(node)
s = values["s"]
d = values["d"]
l = values["l"]
return Foo(s, d, l)
yaml.add_constructor(u'!Foo', foo_constructor)
f = yaml.load('''
--- !Foo
s: 1
l: [1, 2]
d: {try: this}''')
print(f)
# prints: 'Foo(1, {'try': 'this'}, [1, 2])'
This works fine because f
holds the references to the l
and d
objects, which are actually filled with data after the Foo
object is created.
Now, let's do something a smidgen more complicated:
class Foo(object):
def __init__(self, s, l=None, d=None):
self.s = s
# assume two-value list for l
self.l1, self.l2 = l
self.d = d
Now we get the following error
Traceback (most recent call last):
File "test.py", line 27, in <module>
d: {try: this}''')
File "/opt/homebrew/lib/python2.7/site-packages/yaml/__init__.py", line 71, in load
return loader.get_single_data()
File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 39, in get_single_data
return self.construct_document(node)
File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 43, in construct_document
data = self.construct_object(node)
File "/opt/homebrew/lib/python2.7/site-packages/yaml/constructor.py", line 88, in construct_object
data = constructor(self, node)
File "test.py", line 19, in foo_constructor
return Foo(s, d, l)
File "test.py", line 7, in __init__
self.l1, self.l2 = l
ValueError: need more than 0 values to unpack
This is because the yaml constructor is starting at the outer layer of nesting before and constructing the object before all nodes are finished. Is there a way to reverse the order and start with deeply embedded (e.g. nested) objects first? Alternatively, is there a way to get construction to happen at least after the node's objects have been loaded?
Well, what do you know. The solution I found was so simple, yet not so well documented.
The Loader class documentation clearly shows the
construct_mapping
method only takes in a single parameter (node
). However, after considering writing my own constructor, I checked out the source, and the answer was right there! The method also takes in a parameterdeep
(default False).So, the correct constructor method to use is
I guess PyYaml could use some additional documentation, but I'm grateful that it already exists.
There are several problems with your code (and your solution), let's address them step by step.
The code you present will not print what it says in the bottom line comment, (
'Foo(1, {'try': 'this'}, [1, 2])'
) as there is no__str__()
defined forFoo
, it prints something like:This is easily remedied by adding the following method to
Foo
:and if you then look at the output:
This is close, but not what you promised in the comment either. The
list
and thedict
are swapped, because in yourfoo_constructor()
you createFoo()
with the wrong order of parameters.This points to a more fundamental problem that your
foo_constructor()
needs to know to much about the object it is creating. Why is this so? It is not just the parameter order, try:One would expect this to print
Foo(1, None, [1, 2])
(with the default value of the non-specifiedd
keyword argument).What you get is a KeyError exception on
d = value['d']
.You can of use
get('d')
, etc., infoo_constructor()
to solve this, but you have to realise that for correct behaviour you must specify the default values from yourFoo.__init__()
(which in your case just happen to be allNone
), for each and every parameter with a default value:keeping this updated is of course a maintenance nightmare.
So scrap the whole
foo_constructor
and replace it with something that looks more like how PyYAML does this internally:This handles missing (default) parameters and doesn't have to be updated if the defaults for your keyword arguments change.
All of this in a complete example, including a self referential use of the object (always tricky):
gives:
This was tested using ruamel.yaml (of which I am the author), which is a enhanced version of PyYAML. The solution should work the same for PyYAML itself.
In addition to your own answer, scicalculator: if you wish to not have to remember this flag next time, and/or wish to have a more object-oriented approach, you can use yamlable, I wrote it to ease the yaml-to-object binding for our production code.
This is how you would write your example:
yields
and you can dump too:
Note how the two methods
to_yaml_dict
andfrom_yaml_dict
can be overriden so as to customize the mapping in both direction.