I have some preprocessing to do with some existing .yml files - however, some of them have Jinja template syntax embedded in them:
A:
B:
- ip: 1.2.3.4
- myArray:
- {{ jinja.variable }}
- val1
- val2
I'd want to read in this file, and add val3
under myArray
as such:
A:
B:
- ip: 1.2.3.4
- myArray:
- {{ jinja.variable }}
- val1
- val2
- val 3
I tried manually writing out the jinja templates, but they got written with single quotes around them: '{{ jinja.variable }}'
What's the recommended way for me to read such .yml files and modify them, albeit with preexisting Jinja syntax? I'd like to add information to these files keeping all else the same.
I tried the above using PyYAML on Python 2.7+
The solution in this answer has been incorporated into ruamel.yaml using a plugin mechanism. At the bottom of this post there are quick-and-dirty instructions on how to use that.
There are three aspects in updating a YAML file that contains jinja2 "code":
Let's start by making your example somewhat more realistic by adding a jinja2 variable definition and for-loop and adding some comments (
input.yaml
):The lines starting with
{%
contain no YAML, so we'll make those into comments (assuming that comments are preserved on round-trip, see below). Since YAML scalars cannot start with{
without being quoted we'll change the{{
to<{
. This is done in the following code by callingsanitize()
(which also stores the patterns used, and the reverse is done insanitize.reverse
(using the stored patterns).The preservation of your YAML code (block-style etc) is best done using
ruamel.yaml
(disclaimer: I am the author of that package), that way you don't have to worry about flow-style elements in the input getting mangled into as block style as with the rather crudedefault_flow_style=False
that the other answers use.ruamel.yaml
also preserves comments, both the ones that were originally in the file, as well as those temporarily inserted to "comment out" jinja2 constructs starting with%{
.The resulting code:
which prints (specify a second parameter to
update_one()
to write to a file) using Python 2.7:If neither
#{
nor<{
are in any of the original inputs then sanitizing and reverting can be done with simple one-line functions (see this versions of this post), and then you don't need the classSanitize
Your example is indented with one position (key
B
) as well as two positions (the sequence elements),ruamel.yaml
doesn't have that fine control over output indentation (and I don't know of any YAML parser that does). The indent (defaulting to 2) is applied to both YAML mappings as to sequence elements (measured to the beginning of the element, not to the dash). This has no influence on re-reading the YAML and happened to the output of the other two answerers as well (without them pointing out this change).Also note that
YAML().load()
is safe (i.e. doesn't load arbitrary potentially malicious objects), whereas theyaml.load()
as used in the other answers is definitely unsafe, it says so in the documentation and is even mentioned in the WikiPedia article on YAML. If you useyaml.load()
, you would have to check each and every input file to make sure there are no tagged objects that could cause your disc to be wiped (or worse).If you need to update your files repeatedly, and have control over the jinja2 templating, it might be better to change the patterns for jinja2 once and not revert them, and then specifying appropriate
block_start_string
,variable_start_string
(and possibleblock_end_string
andvariable_end_string
) to thejinja2.FileSystemLoader
added as loader to thejinja2.Environment
.If the above seems to complicated then in a a virtualenv do:
assuming you have the
input.yaml
from before you can run:to get the
diff
output:ruamel.yaml
0.15.7 implements a new plug-in mechanism andruamel.yaml.jinja2
is a plug-in that rewraps the code in this answer transparently for the user. Currently the information for reversion is attached to theYAML()
instance, so make sure you doyaml = YAML(typ='jinja2')
for each file you process (that information could be attached to the top-leveldata
instance, just like the YAML comments are).In their current format, your
.yml
files are jinja templates which will not be validyaml
until they have been rendered. This is because the jinja placeholder syntax conflicts with yaml syntax, as braces ({
and}
) can be used to represent mappings in yaml.One way to workaround this is to replace the jinja placeholders with something else, process the file as yaml, then reinstate the placeholders.
Open the file as a text file
The regular expression
r'{{\s*(?P<jinja>[a-zA-Z_][a-zA-Z0-9_]*)\s*}}'
will match any jinja placeholders in the text; the named groupjinja
in the expression captures the variable name. The regular expression the same as that used by Jinja2 to match variable names.The re.sub function can reference named groups in its replacement string using the
\g
syntax. We can use this feature to replace the jinja syntax with something that does not conflict with yaml syntax, and does not already appear in the files that you are processing. For example replace{{ ... }}
with<< ... >>
.Now load the text as yaml:
Add the new value:
Serialise back to a yaml string:
Now reinstate the jinja syntax.
And write the yaml to disk.
One way to do this is to use the
jinja2
parser itself to parse the template and output an alternate format.Jinja2 Code:
This code inherits from the Jinja2
Parser
,Lexer
andEnvironment
classes to parse inside variable blocks (usually{{ }}
). Instead of evaluating the variables, this code changes the text to something thatyaml
can understand. The exact same code can be used to reverse the process with an exchange of the delimiters. By default it translates to the delimiters suggested by snakecharmerb.How/Why?
The jinja2 parser scans the template file looking for delimiters. When finding delimiters, it then switches to parse the appropriate material between the delimiters. The changes in the code here insert themselves into the lexer and parser to capture the text captured during the template compilation, and then when finding the termination delimiter, concats the parsed tokens into a string and inserts it as a
jinja2.nodes.Const
parse node, in place of the compiled jinja code, so that when the template is rendered the string is inserted instead of a variable expansion.The MyEnvironment() code is used to hook in the custom parser and lexer extensions. And while at it, added some parameters processing.
The primary advantage of this approach is that it should be fairly robust to parsing whatever jinja will parse.
User Code:
Test Code:
data.yml
Results: