How can I control what scalar form PyYAML uses for

2019-01-07 14:54发布

I've got an object with a short string attribute, and a long multi-line string attribute. I want to write the short string as a YAML quoted scalar, and the multi-line string as a literal scalar:

my_obj.short = "Hello"
my_obj.long = "Line1\nLine2\nLine3"

I'd like the YAML to look like this:

short: "Hello"
long: |
  Line1
  Line2
  Line3

How can I instruct PyYAML to do this? If I call yaml.dump(my_obj), it produces a dict-like output:

{long: 'line1

    line2

    line3

    ', short: Hello}

(Not sure why long is double-spaced like that...)

Can I dictate to PyYAML how to treat my attributes? I'd like to affect both the order and style.

4条回答
小情绪 Triste *
2楼-- · 2019-01-07 15:34

Based on Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

import yaml
from collections import OrderedDict

class quoted(str):
    pass

def quoted_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)

class literal(str):
    pass

def literal_presenter(dumper, data):
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)

def ordered_dict_presenter(dumper, data):
    return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)

d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))

print(yaml.dump(d))

Output

short: "Hello"
long: |
  Line1
  Line2
  Line3
查看更多
够拽才男人
3楼-- · 2019-01-07 15:34

You can use ruamel.yaml and its RoundTripLoader/Dumper (disclaimer: I am the author of that package) apart from doing what you want, it supports the YAML 1.2 specification (from 2009), and has several other improvements:

import sys
from ruamel.yaml import YAML

yaml_str = """\
short: "Hello"  # does keep the quotes, but need to tell the loader
long: |
  Line1
  Line2
  Line3
folded: >
  some like
  explicit folding
  of scalars
  for readability
"""

yaml = YAML()
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

gives:

short: "Hello"  # does keep the quotes, but need to tell the loader
long: |
  Line1
  Line2
  Line3
folded: >
  some like
  explicit folding
  of scalars
  for readability

(including the comment, starting in the same column as before)

You can also create this output starting from scratch, but then you do need to provide the extra information e.g. the explicit positions on where to fold.

查看更多
来,给爷笑一个
4楼-- · 2019-01-07 15:39

Falling in love with @lbt's approach, I got this code:

import yaml

def str_presenter(dumper, data):
  if len(data.splitlines()) > 1:  # check for multiline string
    return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
  return dumper.represent_scalar('tag:yaml.org,2002:str', data)

yaml.add_representer(str, str_presenter)

It makes every multiline string be a block literal.

I was trying to avoid the monkey patching part. Full credit to @lbt and @J.F.Sebastian.

查看更多
兄弟一词,经得起流年.
5楼-- · 2019-01-07 15:55

I wanted any input with a \n in it to be a block literal. Using the code in yaml/representer.py as a base I got:

# -*- coding: utf-8 -*-
import yaml

def should_use_block(value):
    for c in u"\u000a\u000d\u001c\u001d\u001e\u0085\u2028\u2029":
        if c in value:
            return True
    return False

def my_represent_scalar(self, tag, value, style=None):
    if style is None:
        if should_use_block(value):
             style='|'
        else:
            style = self.default_style

    node = yaml.representer.ScalarNode(tag, value, style=style)
    if self.alias_key is not None:
        self.represented_objects[self.alias_key] = node
    return node


a={'short': "Hello", 'multiline': """Line1
Line2
Line3
""", 'multiline-unicode': u"""Lêne1
Lêne2
Lêne3
"""}

print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))
yaml.representer.BaseRepresenter.represent_scalar = my_represent_scalar
print(yaml.dump(a))
print(yaml.dump(a, allow_unicode=True))

Output

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n", short: Hello}

{multiline: 'Line1

    Line2

    Line3

    ', multiline-unicode: 'Lêne1

    Lêne2

    Lêne3

    ', short: Hello}

After override

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: "L\xEAne1\nL\xEAne2\nL\xEAne3\n"
short: Hello

multiline: |
  Line1
  Line2
  Line3
multiline-unicode: |
  Lêne1
  Lêne2
  Lêne3
short: Hello
查看更多
登录 后发表回答