I parse the following YAML data in python:
>>> import yaml
>>> yaml.load("""
... ---
... categories: {1: Yes, 2: No}
... increasing: [00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]
... ...
... """)
And get this as output:
{'increasing': [0, 1, 2, 3, 4, 5, 6, 7, '08', '09', 10], 'categories': {1: True, 2: False}}
- Why are "Yes" and "No" converted to True and False?
- Why are "08" and "09" parsed as strings whereas the other digits are parsed as numbers with leading zeros truncated?
Your deduction that for 00
to 07
the leading zeros are truncated is incorrect. These are all octal characters because of the leading 0
and interpreted as such.
As octal characters cannot contain 8
or 9
the 08
and 09
cannot be anything but strings, and your YAML parser loads them as such.
This is actually a leftover (backwards compatibility) with YAML 1.1 in YAML 1.2 octal numbers should start with 0o
That Yes
and No
are loaded as True
and False
resp. is also a YAML-1.1-ishm. The 1.2 specification no longer refers to these alternatives. If you quote those strings, they will not be converted
You can relatively easily build a resolver that doesn't accept the Yes/No/On/Off variants for True/False by adding the following rule:
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:bool',
re.compile(u'''^(?:true|True|TRUE|false|False|FALSE)$''', re.X),
list(u'tTfF'))
or by using the normal Resolver
and deleting the appropriate start symbol entries:
import ruamel.yaml as yaml
from ruamel.yaml.resolver import Resolver
yaml_str = """\
categories: {1: Yes, 2: No}
"""
for ch in list(u'yYnNoO'):
del Resolver.yaml_implicit_resolvers[ch]
data = yaml.load(yaml_str, Loader=yaml.Loader)
print(data)
gives you:
{'categories': {1: 'Yes', 2: 'No'}}
Making all number-only strings that start with 0 to be recognised as normal integers is not so simple, because if you change the implicit resolver for int
and pass the strings on that start with 0, you get a parsing problem, because 08
is converted based on octal ¹:
import re
import ruamel.yaml as yaml
from ruamel.yaml.reader import Reader
from ruamel.yaml.resolver import BaseResolver, Resolver
from ruamel.yaml.scanner import RoundTripScanner
from ruamel.yaml.parser_ import Parser
from ruamel.yaml.composer import Composer
from ruamel.yaml.constructor import RoundTripConstructor
from ruamel.yaml import RoundTripLoader
from ruamel.yaml.compat import to_str
yaml_str = """\
categories: {1: Yes, 2: No}
increasing: [00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]
"""
class MyResolver(BaseResolver):
pass
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:bool',
re.compile(u'''^(?:true|True|TRUE|false|False|FALSE)$''', re.X),
list(u'tTfF'))
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:float',
re.compile(u'''^(?:
[-+]?(?:[0-9][0-9_]*)\\.[0-9_]*(?:[eE][-+]?[0-9]+)?
|[-+]?(?:[0-9][0-9_]*)(?:[eE][-+]?[0-9]+)
|\\.[0-9_]+(?:[eE][-+][0-9]+)?
|[-+]?[0-9][0-9_]*(?::[0-5]?[0-9])+\\.[0-9_]*
|[-+]?\\.(?:inf|Inf|INF)
|\\.(?:nan|NaN|NAN))$''', re.X),
list(u'-+0123456789.'))
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:int',
re.compile(u'''^(?:[-+]?0b[0-1_]+
|[-+]?[0-9]+
|[-+]?0o?[0-7_]+
|[-+]?(?:0|[1-9][0-9_]*)
|[-+]?0x[0-9a-fA-F_]+
|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+)$''', re.X),
list(u'-+0123456789'))
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:merge',
re.compile(u'^(?:<<)$'),
[u'<'])
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:null',
re.compile(u'''^(?: ~
|null|Null|NULL
| )$''', re.X),
[u'~', u'n', u'N', u''])
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:timestamp',
re.compile(u'''^(?:[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]
|[0-9][0-9][0-9][0-9] -[0-9][0-9]? -[0-9][0-9]?
(?:[Tt]|[ \\t]+)[0-9][0-9]?
:[0-9][0-9] :[0-9][0-9] (?:\\.[0-9]*)?
(?:[ \\t]*(?:Z|[-+][0-9][0-9]?(?::[0-9][0-9])?))?)$''', re.X),
list(u'0123456789'))
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:value',
re.compile(u'^(?:=)$'),
[u'='])
# The following resolver is only for documentation purposes. It cannot work
# because plain scalars cannot start with '!', '&', or '*'.
MyResolver.add_implicit_resolver(
u'tag:yaml.org,2002:yaml',
re.compile(u'^(?:!|&|\\*)$'),
list(u'!&*'))
class MyRoundTripConstructor(RoundTripConstructor):
def construct_yaml_int(self, node):
value = to_str(self.construct_scalar(node))
value = value.replace('_', '')
sign = +1
if value[0] == '-':
sign = -1
if value[0] in '+-':
value = value[1:]
if value == '0':
return 0
elif value.startswith('0b'):
return sign*int(value[2:], 2)
elif value.startswith('0x'):
return sign*int(value[2:], 16)
elif value.startswith('0o'):
return sign*int(value[2:], 8)
#elif value[0] == '0':
# return sign*int(value, 8)
elif ':' in value:
digits = [int(part) for part in value.split(':')]
digits.reverse()
base = 1
value = 0
for digit in digits:
value += digit*base
base *= 60
return sign*value
else:
return sign*int(value)
MyRoundTripConstructor.add_constructor(
u'tag:yaml.org,2002:int',
MyRoundTripConstructor.construct_yaml_int)
class MyRoundTripLoader(Reader, RoundTripScanner, Parser,
Composer, MyRoundTripConstructor, MyResolver):
def __init__(self, stream):
Reader.__init__(self, stream)
RoundTripScanner.__init__(self)
Parser.__init__(self)
Composer.__init__(self)
MyRoundTripConstructor.__init__(self)
MyResolver.__init__(self)
for ch in list(u'yYnNoO'):
del Resolver.yaml_implicit_resolvers[ch]
data = yaml.load(yaml_str, Loader=MyRoundTripLoader)
print(data['increasing'])
and that prints:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
(it also does Yes/No as strings, without first inserting the recognition patterns in the internal lookup table)
¹ I used ruamel.yaml for this, of which I am the author. PyYAML, on which ruamel.yaml is based, should be able to support a similar derivation.
Yes
and No
have special meanings in YAML. Have a look at the Wikipedia article. To circumvent this you could change your YAML to include quotes and look like this
>>> yaml.load("""
... ---
... categories: {1: "Yes", 2: "No"}
... increasing: [00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10]
... ...
... """)
Regarding the leading zeroes of 08 and 09 i am not quite sure why this is happening, but it does'nt seem to be a python issue