Parsing YAML, return with line number

2020-02-22 03:32发布

问题:

I'm making a document generator from YAML data, which would specify which line of the YAML file each item is generated from. What is the best way to do this? So if the YAML file is like this:

- key1: item 1
  key2: item 2
- key1: another item 1
  key2: another item 2

I want something like this:

[
     {'__line__': 1, 'key1': 'item 1', 'key2': 'item 2'},
     {'__line__': 3, 'key1': 'another item 1', 'key2': 'another item 2'},
]

I'm currently using PyYAML, but any other library is OK if I can use it from Python.

回答1:

I've made it by adding hooks to Composer.compose_node and Constructor.construct_mapping:

import yaml
from yaml.composer import Composer
from yaml.constructor import Constructor

def main():
    loader = yaml.Loader(open('data.yml').read())
    def compose_node(parent, index):
        # the line number where the previous token has ended (plus empty lines)
        line = loader.line
        node = Composer.compose_node(loader, parent, index)
        node.__line__ = line + 1
        return node
    def construct_mapping(node, deep=False):
        mapping = Constructor.construct_mapping(loader, node, deep=deep)
        mapping['__line__'] = node.__line__
        return mapping
    loader.compose_node = compose_node
    loader.construct_mapping = construct_mapping
    data = loader.get_single_data()
    print(data)


回答2:

Here's an improved version of puzzlet's answer:

import yaml
from yaml.loader import SafeLoader

class SafeLineLoader(SafeLoader):
    def construct_mapping(self, node, deep=False):
        mapping = super(SafeLineLoader, self).construct_mapping(node, deep=deep)
        # Add 1 so line numbering starts at 1
        mapping['__line__'] = node.start_mark.line + 1
        return mapping

You can use it like this:

data = yaml.load(whatever, Loader=SafeLineLoader)


回答3:

If you are using ruamel.yaml >= 0.9 (of which I am the author), and use the RoundTripLoader, you can access the property lc on collection items to get line and column where they started in the source YAML:

def test_item_04(self):
    data = load("""
     # testing line and column based on SO
     # http://stackoverflow.com/questions/13319067/
     - key1: item 1
       key2: item 2
     - key3: another item 1
       key4: another item 2
        """)
    assert data[0].lc.line == 2
    assert data[0].lc.col == 2
    assert data[1].lc.line == 4
    assert data[1].lc.col == 2

(line and column start counting at 0).

This answer show how to add the lc attribute to string types during loading.