Loading and dumping multiple yaml files with ruame

2019-08-15 10:22发布

Using python 2 (atm) and ruamel.yaml 0.13.14 (RedHat EPEL)

I'm currently writing some code to load yaml definitions, but they are split up in multiple files. The user-editable part contains eg.

users:
  xxxx1:
    timestamp: '2018-10-22 11:38:28.541810'
    << : *userdefaults
  xxxx2:
    << : *userdefaults
    timestamp: '2018-10-22 11:38:28.541810'

the defaults are stored in another file, which is not editable:

userdefaults: &userdefaults
    # Default values for user settings
    fileCountQuota: 1000
    diskSizeQuota: "300g"

I can process these together by loading both and concatinating the strings, and then running them through merged_data = list(yaml.load_all("{}\n{}".format(defaults_data, user_data), Loader=yaml.RoundTripLoader)) which correctly resolves everything. (when not using RoundTripLoader I get errors that the references cannot be resolved, which is normal)

Now, I want to do some updates via python code (eg. update the timestamp), and for that I need to just write back the user part. And that's where things get hairy. I sofar haven't found a way to just write that yaml document, not both.

2条回答
孤傲高冷的网名
2楼-- · 2019-08-15 10:49

First of all, unless there are multiple documents in your defaults file, you don't have to use load_all, as you don't concatenate two documents into a multiple-document stream. If you had by using a format string with a document-end marker ("{}\n...\n{}") or with a directives-end marker ("{}\n---\n{}") your aliases would not carry over from one document to another, as per the YAML specification:

It is an error for an alias node to use an anchor that does not previously occur in the document.

The anchor has to be in the document, not just in the stream (which can consist of multiple documents).


I tried some hocus pocus, pre-populating the already represented dictionary of anchored nodes:

import sys
import datetime
from ruamel import yaml

def load():
    with open('defaults.yaml') as fp:
        defaults_data = fp.read()
    with open('user.yaml') as fp:
        user_data = fp.read()
    merged_data = yaml.load("{}\n{}".format(defaults_data, user_data), 
                            Loader=yaml.RoundTripLoader)
    return merged_data

class MyRTDGen(object):
    class MyRTD(yaml.RoundTripDumper):
        def __init__(self, *args, **kw):
            pps = kw.pop('pre_populate', None)
            yaml.RoundTripDumper.__init__(self, *args, **kw)
            if pps is not None:
                for pp in pps:
                    try:
                        anchor = pp.yaml_anchor()
                    except AttributeError:
                        anchor = None
                    node = yaml.nodes.MappingNode(
                        u'tag:yaml.org,2002:map', [], flow_style=None, anchor=anchor)
                    self.represented_objects[id(pp)] = node

    def __init__(self, pre_populate=None):
        assert isinstance(pre_populate, list)
        self._pre_populate = pre_populate 

    def __call__(self, *args, **kw):
        kw1 = kw.copy()
        kw1['pre_populate'] = self._pre_populate
        myrtd = self.MyRTD(*args, **kw1)
        return myrtd


def update(md, file_name):
    ud = md.pop('userdefaults')
    MyRTD = MyRTDGen([ud])
    yaml.dump(md, sys.stdout, Dumper=MyRTD)
    with open(file_name, 'w') as fp:
        yaml.dump(md, fp, Dumper=MyRTD)

md = load()
md['users']['xxxx2']['timestamp'] = str(datetime.datetime.utcnow())
update(md, 'user.yaml')

Since the PyYAML based API requires a class instead of an object, you need to use a class generator, that actually adds the data elements to pre-populate on the fly from withing yaml.load().

But this doesn't work, as a node only gets written out with an anchor once it is determined that the anchor is used (i.e. there is a second reference). So actually the first merge key gets written out as an anchor. And although I am quite familiar with the code base, I could not get this to work properly in a reasonable amount of time.

So instead, I would just rely on the fact that there is only one key that matches the first key of users.yaml at the root level of the dump of the combined updated file and strip anything before that.

import sys
import datetime
from ruamel import yaml

with open('defaults.yaml') as fp:
    defaults_data = fp.read()
with open('user.yaml') as fp:
    user_data = fp.read()
merged_data = yaml.load("{}\n{}".format(defaults_data, user_data), 
                        Loader=yaml.RoundTripLoader)

# find the key
for line in user_data.splitlines():
    line = line.split('# ')[0].rstrip()  # end of line comment, not checking for strings
    if line and line[-1] == ':' and line[0] != ' ':
        split_key = line
        break

merged_data['users']['xxxx2']['timestamp'] = str(datetime.datetime.utcnow())

buf = yaml.compat.StringIO()
yaml.dump(merged_data, buf, Dumper=yaml.RoundTripDumper)
document = split_key + buf.getvalue().split('\n' + split_key)[1]
sys.stdout.write(document)

which gives:

users:
  xxxx1:
    <<: *userdefaults
    timestamp: '2018-10-22 11:38:28.541810'
  xxxx2:
    <<: *userdefaults
    timestamp: '2018-10-23 09:59:13.829978'

I had to make a virtualenv to make sure I could run the above with ruamel.yaml==0.13.14. That version is from the time I was still young (I won't claim to have been innocent). There have been over 85 releases of the library since then.

I can understand that you might not be able to run anything but Python2 at the moment and cannot compile/use a newer version. But what you really should do is install virtualenv (can be done using EPEL, but also without further "polluting" your system installation), make a virtualenv for the code you are developping and install the latest version of ruamel.yaml (and your other libraries) in there. You can also do that if you need to distribute your software to other systems, just install virtualenv there as well.

I have all my utilties under /opt/util, and managed virtualenvutils a wrapper around virtualenv.

查看更多
贪生不怕死
3楼-- · 2019-08-15 11:11

For writing the user part, you will have to manually split the output of yaml.dump() multifile output and write the appropriate part back to users yaml file.

import datetime
import StringIO

import ruamel.yaml

yaml = ruamel.yaml.YAML(typ='rt')
data = None

with open('defaults.yaml', 'r') as defaults:
    with open('users.yaml', 'r') as users:
        raw = "{}\n{}".format(''.join(defaults.readlines()), ''.join(users.readlines()))
        data = list(yaml.load_all(raw))

data[0]['users']['xxxx1']['timestamp'] = datetime.datetime.now().isoformat()

with open('users.yaml', 'w') as outfile:
    sio = StringIO.StringIO()
    yaml.dump(data[0], sio)
    out = sio.getvalue()
    outfile.write(out.split('\n\n')[1]) # write the second part here as this is the contents of users.yaml
查看更多
登录 后发表回答