I'm making a document generator from YAML data, which would specify which line of the YAML file each item is generated from. What is the best way to do this? So if the YAML file is like this:
- key1: item 1
key2: item 2
- key1: another item 1
key2: another item 2
I want something like this:
[
{'__line__': 1, 'key1': 'item 1', 'key2': 'item 2'},
{'__line__': 3, 'key1': 'another item 1', 'key2': 'another item 2'},
]
I'm currently using PyYAML, but any other library is OK if I can use it from Python.
I've made it by adding hooks to Composer.compose_node
and Constructor.construct_mapping
:
import yaml
from yaml.composer import Composer
from yaml.constructor import Constructor
def main():
loader = yaml.Loader(open('data.yml').read())
def compose_node(parent, index):
# the line number where the previous token has ended (plus empty lines)
line = loader.line
node = Composer.compose_node(loader, parent, index)
node.__line__ = line + 1
return node
def construct_mapping(node, deep=False):
mapping = Constructor.construct_mapping(loader, node, deep=deep)
mapping['__line__'] = node.__line__
return mapping
loader.compose_node = compose_node
loader.construct_mapping = construct_mapping
data = loader.get_single_data()
print(data)
Here's an improved version of puzzlet's answer:
import yaml
from yaml.loader import SafeLoader
class SafeLineLoader(SafeLoader):
def construct_mapping(self, node, deep=False):
mapping = super(SafeLineLoader, self).construct_mapping(node, deep=deep)
# Add 1 so line numbering starts at 1
mapping['__line__'] = node.start_mark.line + 1
return mapping
You can use it like this:
data = yaml.load(whatever, Loader=SafeLineLoader)
If you are using ruamel.yaml >= 0.9 (of which I am the author), and use the RoundTripLoader
, you can access the property lc
on collection items to get line and column where they started in the source YAML:
def test_item_04(self):
data = load("""
# testing line and column based on SO
# http://stackoverflow.com/questions/13319067/
- key1: item 1
key2: item 2
- key3: another item 1
key4: another item 2
""")
assert data[0].lc.line == 2
assert data[0].lc.col == 2
assert data[1].lc.line == 4
assert data[1].lc.col == 2
(line and column start counting at 0).
This answer show how to add the lc
attribute to string types during loading.