Parse YAML and assume a certain path is always a s

2019-06-25 14:07发布

问题:

I am using the YAML parser from http://pyyaml.org and I want it to always interpret certain fields as string, but I can't figure out how add_path_resolver() works.

For example: The parser assumes that "version" is a float:

network:
- name: apple
- name: orange
version: 2.3
site: banana

Some files have "version: 2" (which is interpreted as an int) or "version: 2.3 alpha" (which is interpreted as a str).

I want them to always be interpreted as a str.

It seems that yaml.add_path_resolver() should let me specify, "When you see version:, always interpret it as a str) but it is not documented very well. My best guess is:

yaml.add_path_resolver(u'!root', ['version'], kind=str)

But that doesn't work.

Suggestions on how to get my version field to always be a string?

P.S. Here are some examples of different "version" strings and how they are interpreted:

(Pdb) import yaml
(Pdb) import pprint
(Pdb) pprint.pprint(yaml.load("---\nnetwork:\n- name: apple\n- name: orange\nversion: 2\nsite: banana"))
{'network': [{'name': 'apple'}, {'name': 'orange'}],
 'site': 'banana',
 'version': 2}
(Pdb) pprint.pprint(yaml.load("---\nnetwork:\n- name: apple\n- name: orange\nversion: 2.3\nsite: banana"))
{'network': [{'name': 'apple'}, {'name': 'orange'}],
 'site': 'banana',
 'version': 2.2999999999999998}
(Pdb) pprint.pprint(yaml.load("---\nnetwork:\n- name: apple\n- name: orange\nversion: 2.3 alpha\nsite: banana"))
{'network': [{'name': 'apple'}, {'name': 'orange'}],
 'site': 'banana',
 'version': '2.3 alpha'}

回答1:

By far the easiest solution for this is not use the basic .load() (which is unsafe anyway), but use it with Loader=BaseLoader, which loads every scalar as a string:

import yaml

yaml_str = """\
network:
- name: apple
- name: orange
version: 2.3
old: 2
site: banana
"""

data = yaml.load(yaml_str, Loader=yaml.BaseLoader)
print(data)

gives:

{'network': [{'name': 'apple'}, {'name': 'orange'}], 'version': '2.3', 'old': '2', 'site': 'banana'}


回答2:

From the current source:

 # Note: `add_path_resolver` is experimental.  The API could be changed.

It appears that it's not complete (yet?). The syntax that would work (as far as I can tell) is:

yaml.add_path_resolver(u'tag:yaml.org,2002:str', ['version'], yaml.ScalarNode)

However, it doesn't.

It appears that the implicit type resolvers are checked first, and if one matches, then it never checks the user-defined resolvers. See resolver.py for more details (look for the function resolve).

I suggest changing your version entry to

version: !!str 2.3

This will always coerce it to a string.