Different YAML array representations

2019-02-17 23:27发布

问题:

I'm writing a file type converter using Python and PyYAML for a project where I am translating to and from YAML files multiple times. These file are then used by a separate service that I have no control over, so I need to translate back the YAML the same as I originally got it. My original file has sections of the following:

key:
- value1
- value2
- value3

Which evaluates to {key: [value1,value2,value3]} using yaml.load(). When I translate this back to YAML my new file reads like this:

key: [value1,value2,value3]

My question is whether these two forms are equivalent as far as the various language parsers of YAML files are concerned. Obviously using PyYaml, these are equivalent, but does this hold true for Ruby or other languages, which the application is using? If not, then the application will not be able to display the data properly.

回答1:

Yes, to any YAML parser that follows the spec, they are equivalent. You can read the spec here: http://www.yaml.org/spec/1.2/spec.html

Section 3.2.3.1 is particularly relevant (emphasis mine):

3.2.3.1. Node Styles

Each node is presented in some style, depending on its kind. The node style is a presentation detail and is not reflected in the serialization tree or representation graph. There are two groups of styles. Block styles use indentation to denote structure; In contrast, flow styles styles rely on explicit indicators.

To clarify, a node is any structure in YAML, including arrays (called sequences in the spec). The single-line style is called a flow sequence (see section 7.4.1) and the multi-line style is called a block sequence (section 8.2.1). A compliant parser will deserialize both into identical objects.



回答2:

As Jordan already pointed out the node style is a serialization detail. And the output is equivalent to your input.

With PyYAML you can get the same block style output by using the default_flow_style keyword when dumping:

yaml.dump(yaml.load("""\
key:
- value1
- value2
- value3
"""), sys.stdout, default_flow_style=False)

gives you:

key:
- value1
- value2
- value3

If you would be using the round-trip capabilities from ruamel.yaml (disclaimer: I am the author of that package) you could do:

import sys
import ruamel.yaml as yaml

yaml_str = """\
key:
- value1
- value2  # this is the second value
- value3
"""

data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)

yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper, default_flow_style=False)

to get:

key:
- value1
- value2  # this is the second value
- value3

Not only does it preserve the flow/block style, but also the comment and the key ordering and some more transparently. This makes comparison (e.g. when using some revision control system to check in the YAML file), much easier.

For the service reading the YAML file this all makes no difference, but for the ease of checking whether you are transforming things correctly, it does.