The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.
I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet).
Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode
tags.
# Encoding: UTF-8
import yaml
menu= u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""
print yaml.load(menu)
Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']
I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']
Here's a version which overrides the PyYAML handling of strings by always outputting
unicode
. In reality, this is probably the identical result of the other response I posted except shorter (i.e. you still need to make sure that strings in custom classes are converted tounicode
or passedunicode
strings yourself if you use custom handlers):(The above gives
[u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']
)I haven't tested it on
LibYAML
(the c-based parser) as I couldn't compile it though, so I'll leave the other answer as it was.Here's a function you could use to use to replace
str
withunicode
types from the decoded output ofPyYAML
: