Is there a simple way to use a preproccesor / macro-processor with YAML files? (I.e. I'm thinking of something along the lines of the C preprocessor)?
We have a lot of flat text-files that describes various data structures. They're currently in our own in-house format, and are read with an in-house parser. I'd like to switch to YAML files to make use of the various pre-existing libraries for reading and writing.
However our files are hierarchical, in that we "include" master files into sub files, and using variable substitution generate new data structures.
As a toy example I'd want something like:
name: $COUNTRY$
file: C:\data\$COUNTRY$
#define $COUNTRY$ UK
#include <country_master.yaml>
#define $COUNTRY$ USA
#include <country_master.yaml>
Then after preprocessing we'd get something like:
name: USA
file: C:\data\USA
The C-preprocessor won't work with the # character used in YAML comments. Also, ideally we'd like to have loops which are expanded by the preprocessor, so in the above example we'd create UK and USA together with a loop (and I don't believe you can loop with cpp
Any ideas?
You are trying to change things on the level of the string representation of YAML, and I think you shouldn't. YAML can load objects, and those objects can influence later elements loaded, by hooking into the parser. That way you can replace complete nodes with data, change values within scalars, etc.
Let's assume you have this YAML file main.yml
- !YAMLPreProcessor
verbose: '3'
escape: ♦
- ♦replace(verbose)
- abcd
- ♦include(xyz.yml)
- xyz
and that xyz.yml
k: 9
l: 8
m: [7. 6] # can be either
and you have ♦
as special character (it could be anything as long as YAMLPreProcessor value for special matches the start of the action keyword (replace
and include
). You want this to be round-tripped (loaded into data in memory and then dumped to the following YAML:
- !YAMLPreProcessor
verbose: '3'
escape: ♦
- '3'
- abcd
- k: 9
l: 8
m: [7. 6] # can be either
- xyz
You can do that by overloading the scalar constructor that gets called for each scalar and an appropriate YAMLPreProcessor
# coding: utf-8
from __future__ import print_function
import ruamel.yaml as yaml
def construct_scalar(loader, node):
self = getattr(loader, '_yaml_preprocessor', None)
if self and self.d.get('escape'):
if node.value and node.value.startswith(self.d['escape']):
key_word, rest = node.value[1:].split('(', 1)
args, rest = rest.split(')', 1)
if key_word == 'replace':
res = u''
for arg in args.split(','):
res += str(self.d[arg])
node.value = res + rest
elif key_word == 'include':
inc_yml = yaml.load(
# this needs ruamel.yaml>=0.9.6
return inc_yml
print('keyword not found:', key_word)
ret_val = loader._org_construct_scalar(node)
# print('ret_val', type(ret_val), ret_val)
return ret_val
class YAMLPreProcessor:
def __init__(self, escape=None, verbose=0):
self.d = dict(escape=escape, verbose=verbose)
def __repr__(self):
return "YAMLPreProcessor({escape!r}, {verbose})".format(**self.d)
def __yaml_out__(dumper, self):
return dumper.represent_mapping('!YAMLPreProcessor', self.d)
def __yaml_in__(loader, data):
from ruamel.yaml.comments import CommentedMap
result = YAMLPreProcessor()
loader._yaml_preprocessor = result
z = dict()
loader.construct_mapping(data, z)
result.d = z
yield result
def __delete__(self):
loader._yaml_preprocessor = None
def construct_yaml_str(self, node):
value = self.construct_scalar(node)
if isinstance(value, ScalarString):
return value
if PY3:
return value
return value.encode('ascii')
except AttributeError:
# in case you replace the node dynamically e.g. with a dict
return value
except UnicodeEncodeError:
return value
loader = yaml.RoundTripLoader
loader.add_constructor('!YAMLPreProcessor', YAMLPreProcessor.__yaml_in__)
loader._org_construct_scalar = loader.construct_scalar
loader.construct_scalar = construct_scalar
data_from_yaml = yaml.load(open('main.yml'), Loader=loader)
#print ('out', data_from_yaml)
dumper = yaml.RoundTripDumper
# need to be able to represent '!YAMLPreProcessor'
# but you can of course also remove the first element
# from data_from_yaml if you don't want the preprocessor in your output
dumper.add_representer(YAMLPreProcessor, YAMLPreProcessor.__yaml_out__)
print(yaml.dump(data_from_yaml, Dumper=dumper, allow_unicode=True))
The above needs a recent version of ruamel.yaml (0.9.6) as older versions
choke if construct_scalar returns a non-string object.
Please note that the position of the comment behind the line with
the m
key is relative to the start of the line, and in the example there
is no compensation for the indent level of the node where the xyz.yml
file is inserted.
# Yamp - YAML Macro-Processor
# in master.yaml
name: country
args: [$COUNTRY$]
name: $COUNTRY$
file: C:\data\{{$COUNTRY$}}
# in some file
- include: [master.yaml]
# Call with wherever needed:
{ country: USA }