Generating anchors with PyYAML.dump()?

2019-07-17 05:12发布

I'd like to be able to generate anchors in the YAML generated by PyYAML's dump() function. Is there a way to do this? Ideally the anchors would have the same name as the YAML nodes.

Example:

import yaml
yaml.dump({'a': [1,2,3]})
'a: [1, 2, 3]\n'

What I'd like to be able to do is generate YAML like:

import yaml
yaml.dump({'a': [1,2,3]})
'a: &a [1, 2, 3]\n'

Can I write a custom emitter or dumper to do this? Is there another way?

3条回答
smile是对你的礼貌
2楼-- · 2019-07-17 05:31

This is not so easy. Unless the data that you want to use for the anchor is inside the node. This is because the anchor gets attached to the node contents, in your example '[1,2,3]' and doesn't know that this value is associated with key 'a'.

l = [1, 2, 3]
foo = {'a': l, 'b': l}
class SpecialAnchor(yaml.Dumper):

    def generate_anchor(self, node):
        print('Generating anchor for {}'.format(str(node)))
        anchor =  super().generate_anchor(node)
        print('Generated "{}"'.format(anchor))
        return anchor

y1 = yaml.dump(foo, Dumper=Anchor)

Gives you:

Generating anchor for SequenceNode(tag='tag:yaml.org,2002:seq', value=[ScalarNode(tag='tag:yaml.org,2002:int', value='1'), ScalarNode(tag='tag:yaml.org,2002:int', value='2'), ScalarNode(tag='tag:yaml.org,2002:int', value='3')])
Generated "id001"
a: &id001 [1, 2, 3]
b: *id001

So far I haven't found a way to get the key 'a' given the node...

查看更多
走好不送
3楼-- · 2019-07-17 05:51

By default, anchors are only emitted when it detects a reference to an object previously seen:

>>> import yaml
>>>
>>> foo = {'a': [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.safe_dump(doc, default_flow_style=False)
- &id001
  a:
  - 1
  - 2
  - 3
- *id001

If you want to override how it is named, you'll have to customize the Dumper class, specifically the generate_anchor() function. ANCHOR_TEMPLATE may also be useful.

In your example, the node name is simple, but you need to take into account the many possibilities for YAML values, ie it could be a sequence rather than a single value:

>>> import yaml
>>>
>>> foo = {('a', 'b', 'c'): [1,2,3]}
>>> doc = (foo,foo)
>>>
>>> print yaml.dump(doc, default_flow_style=False)
!!python/tuple
- &id001
  ? !!python/tuple
  - a
  - b
  - c
  : - 1
    - 2
    - 3
- *id001
查看更多
我欲成王,谁敢阻挡
4楼-- · 2019-07-17 05:53

I wrote a custom anchor class to force an anchor value for top level nodes. It does not simply override the anchor string (using generate_anchor), but actually forces the Anchor to be emitted, even if the node is not referenced later:

class CustomAnchor(yaml.Dumper):    
    def __init__(self,*args,**kwargs):
        super(CustomAnchor,self).__init__(*args,**kwargs)
        self.depth = 0
        self.basekey = None
        self.newanchors = {}

    def anchor_node(self, node):        
        self.depth += 1                 
        if self.depth == 2:
            assert isinstance(node,yaml.ScalarNode), "yaml node not a string: %s"%node
            self.basekey = str(node.value)
            node.value = self.basekey+"_ALIAS"
        if self.depth == 3:
            assert self.basekey, "could not find base key for value: %s"%node
            self.newanchors[node] = self.basekey  
        super(CustomAnchor,self).anchor_node(node) 
        if self.newanchors:
            self.anchors.update(self.newanchors)
            self.newanchors.clear()                

Note that I override the node name to be suffixed with "_ALIAS", but you could strip that line to leave the node name and anchor name the same, or change it to something else.

E.g. dumping {'FOO': 'BAR'} results in:

FOO_ALIAS: &FOO BAR

Also, I only wrote it to deal with single top level key/value pairs at a time, and it will only force an anchor for the top level key. If you want to turn a dict into a YAML file with all the keys being top level YAML nodes, you will need to iterate over the dict and dump each key/value pair as {key:value}, or rewrite this class to handle a dict with multiple keys.

查看更多
登录 后发表回答