What does deep=True do in pyyaml.Loader.construct_

2019-06-08 02:09发布

问题:

In searching around the web for usages of custom constructors I see things like this:

def some_constructor(loader, node):
    value = loader.construct_mapping(node, deep=True)
    return SomeClass(value)

What does the deep=True do? I don't see it in the pyyaml documentation.

It looks like I need it; I have a yaml file generated by a pyyaml representer and it includes node anchors and aliases (like &id003 and *id003); without deep=True I get a shallow map back for those objects containing anchors/aliases.

回答1:

That you don't see deep=True in the documentation is because you don't normally need to use it as an end-user of the PyYAML package.

If you trace the use of methods in constructor.py that use deep= you come to construct_mapping() and construct_sequence() in class BaseConstructor() and both of these call BaseConstructor.construct_object(). The relevant code in that method to study is:

    if tag_suffix is None:
        data = constructor(self, node)
    else:
        data = constructor(self, tag_suffix, node)
    if isinstance(data, types.GeneratorType):
        generator = data
        data = next(generator)
        if self.deep_construct:
            for dummy in generator:
                pass
        else:
            self.state_generators.append(generator)

and in particular the for loop in there, which only gets executed if deep=True was passed in.

Rougly said if the data comes from a constructor is a generator, then it walks over that data (in the for loop) until the generator is exhausted. With that mechanism, those constructors can contain a yield to create a base object, of which the details can be filled out after the yield. Because of their being only one yield in such constructors, e.g. for mappings (constructed as Python dicts):

def construct_yaml_map(self, node):
    data = {}
    yield data
    value = self.construct_mapping(node)
    data.update(value)

I call this a two step process (one step to the yield the next to the end of the method.

In such two-step constructors the data to be yielded is constructed empty, yielded and then filled out. And that has to be so because of what you already noticed: recursion. If there is a self reference to data somewhere underneath, data cannot be constructed after all its children are constructed, because it would have to wait for itself to be constructed.

The deep parameter indirectly controls whether objects that are potentially generators are recursively being built or appended to the list self.state_generators to be resolved later on.

Constructing a YAML document then boils down to constructing the top-level objects and looping over the potentially recursive objects in self.state_generators until no generators are left (a process that might take more than one pass).