PyYAML dump format

2019-01-22 18:35发布

问题:

I know there are a few questions about this on SO, but I couldn't find what I was looking for.

I'm using pyyaml to read (.load()) a .yml file, modify or add a key, and then write it (.dump()) again. The problem is that I want to keep the file format post-dump, but it changes.

For example, I edit the key en.test.index.few to say "Bye" instead of "Hello"

Python:

with open(path, 'r', encoding = "utf-8") as yaml_file:
    self.dict = pyyaml.load(yaml_file)

Then, afther changing the key:

with open(path, 'w', encoding = "utf-8") as yaml_file:
    dump = pyyaml.dump(self.dict, default_flow_style = False, allow_unicode = True, encoding = None)
    yaml_file.write( dump )

Yaml:

Before:

en:
  test:
    new: "Bye"
    index:
      few: "Hello"
  anothertest: "Something"

After:

en:
  anothertest: Something
  test:
    index:
      few: Hello
    new: Bye

Is there a way to keep the same format?, for example the qoutes and order. Am I using the wrong tool for this?

I know maybe the original file it's not entirely correct, but I have no control over it (it's a Ruby on Rails i18n file).

Thank you very much.

回答1:

Use ruamel.yaml instead.

Library Fight! A Tale of Two Libraries

PyYAML is effectively dead and has been for several years. To compound matters, the official project home at http://pyyaml.org appears to have been taken down recently. This site hosted the PyYAML issue tracker, documentation, and downloads. As of this writing, all are gone. This is nothing short of calamitous. Welcome to just another day in open-source.

ruamel.yaml is actively maintained. Unlike PyYAML, ruamel.yaml supports:

  • YAML <= 1.2. PyYAML only supports YAML <= 1.1. This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
  • Roundtrip preservation. When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():
    • PyYAML naively ignores all input formatting – including comments, ordering, quoting, and whitespace. Discarded like so much digital refuse into the nearest available bit bucket.
    • ruamel.yaml cleverly respects all input formatting. Everything. The whole stylistic enchilada. The entire literary shebang. All.

Library Migration: The Trail of Code Tears

Since ruamel.yaml is a PyYAML fork and hence conforms to the PyYAML API, switching from PyYAML to ruamel.yaml in existing applications is typically as simple as replacing all instances of this:

# This imports PyYAML. Stop doing this.
import yaml

...with this:

# This imports "ruamel.yaml". Always do this.
from ruamel import yaml

That's it.

No other changes should be needed. The yaml.load() and yaml.dump() functions should continue to behave as expected – with the added benefits of now supporting YAML 1.2 and actively receiving bug fixes.

Roundtrip Preservation and What It Can Do for You

For backward compatibility with PyYaml, the yaml.load() and yaml.dump() functions do not perform roundtrip preservation by default. To do so, explicitly pass:

  • The optional Loader=ruamel.yaml.RoundTripLoader keyword parameter to yaml.load().
  • The optional Dumper=ruamel.yaml.RoundTripDumper keyword parameter to yaml.dump().

An example kindly "borrowed" from ruamel.yaml documentation:

import ruamel.yaml

inp = """\
# example
name:
  # Yet another Great Duke of Hell. He's not so bad, really.
  family: TheMighty
  given: Ashtaroth
"""

code = ruamel.yaml.load(inp, Loader=ruamel.yaml.RoundTripLoader)
code['name']['given'] = 'Astarte'  # Oh no you didn't.

print(ruamel.yaml.dump(code, Dumper=ruamel.yaml.RoundTripDumper), end='')

It is done. Comments, ordering, quoting, and whitespace will now be preserved intact.

tl;dr

Always use ruamel.yaml. Never use PyYAML. ruamel.yaml lives. PyYAML is a fetid corpse rotting in the mouldering charnel ground of PyPi.

Long live ruamel.yaml.



回答2:

First

To represent dictionary data is used following code:

mapping = list(mapping.items())
    try:
        mapping = sorted(mapping)
    except TypeError:
        pass

It is why ordering is changed

Second

Information about how scalar type was presented (with double quote or not) is lost when reading (this is principal approach of library)

Summary

You can create own class based on 'Dumper' and to overload method 'represent_mapping' for changing behaviour how dictionary will be presented

For saving information about double quotes for scalar you must also create own class based on 'Loader', but i am afraid that it will affect and other classes and will doing it difficult



回答3:

In my case, I want " if value contains a { or a }, otherwise nothing. For example:

 en:
   key1: value is 1
   key2: 'value is {1}'

To perform that, copy function represent_str() from file representer.py in module PyYaml and use another style if string contains { or a }:

def represent_str(self, data):
    tag = None
    style = None
    # Add these two lines:
    if '{' in data or '}' in data:
        style = '"'
    try:
        data = unicode(data, 'ascii')
        tag = u'tag:yaml.org,2002:str'
    except UnicodeDecodeError:
        try:
            data = unicode(data, 'utf-8')
            tag = u'tag:yaml.org,2002:str'
        except UnicodeDecodeError:
            data = data.encode('base64')
            tag = u'tag:yaml.org,2002:binary'
            style = '|'
    return self.represent_scalar(tag, data, style=style)

To use it in your code:

import yaml

def represent_str(self, data):
  ...

yaml.add_representer(str, represent_str)

In this case, no diffences between keys and values and that's enough for me. If you want a different style for keys and values, perform the same thing with function represent_mapping