In searching around the web for usages of custom constructors I see things like this:
def some_constructor(loader, node):
value = loader.construct_mapping(node, deep=True)
return SomeClass(value)
What does the deep=True
do? I don't see it in the pyyaml documentation.
It looks like I need it; I have a yaml file generated by a pyyaml representer and it includes node anchors and aliases (like &id003
and *id003
); without deep=True
I get a shallow map back for those objects containing anchors/aliases.
That you don't see deep=True
in the documentation is because you don't normally need to use it as an end-user of the PyYAML package.
If you trace the use of methods in constructor.py
that use deep=
you come to construct_mapping()
and construct_sequence()
in class BaseConstructor()
and both of these call BaseConstructor.construct_object()
.
The relevant code in that method to study is:
if tag_suffix is None:
data = constructor(self, node)
else:
data = constructor(self, tag_suffix, node)
if isinstance(data, types.GeneratorType):
generator = data
data = next(generator)
if self.deep_construct:
for dummy in generator:
pass
else:
self.state_generators.append(generator)
and in particular the for
loop in there, which only gets executed if deep=True
was passed in.
Rougly said if the data comes from a constructor is a generator, then it walks over that data (in the for
loop) until the generator is exhausted. With that mechanism, those constructors can contain a yield
to create a base object, of which the details can be filled out after the yield
. Because of their being only one yield
in such constructors, e.g. for mappings (constructed as Python dict
s):
def construct_yaml_map(self, node):
data = {}
yield data
value = self.construct_mapping(node)
data.update(value)
I call this a two step process (one step to the yield
the next to the end of the method.
In such two-step constructors the data
to be yielded is constructed empty, yielded and then filled out. And that has to be so because of what you already noticed: recursion. If there is a self reference to data
somewhere underneath, data
cannot be constructed after all its children are constructed, because it would have to wait for itself to be constructed.
The deep
parameter indirectly controls whether objects that are potentially generators are recursively being built or appended to the list self.state_generators
to be resolved later on.
Constructing a YAML document then boils down to constructing the top-level objects and looping over the potentially recursive objects in self.state_generators
until no generators are left (a process that might take more than one pass).