Is it possible to use PyYAML to read a text file w

2019-08-10 02:02发布

问题:

I'm sorry, I know very little of both YAML and PyYAML but I felt in love with the idea of supporting a configuration file written in the same style used by "Jekyll" (http://jekyllrb.com/docs/frontmatter/) that AFAIK have these "YAML Front Matter" blocks that looks very cool and sexy to me.
So I installed PyYAML on my computer and I wrote a small file with this block of text:

---
First Name: John
Second Name: Doe
Born: Yes
---

Lorem ipsum dolor sit amet, consectetur adipiscing elit,  
sed do eiusmod tempor incididunt ut labore et dolore magna  
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco 
laboris nisi ut aliquip ex ea commodo consequat.

Then I tried to read this text file with Python 3.4 and PyYAML by using this code:

import yaml

stream = open("test.yaml")
a = stream.read()
b = yaml.load(a)

But obviously it's not working, and Python displays this error message:

Traceback (most recent call last):
  File "<pyshell#62>", line 1, in <module>
    b = yaml.load(a)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/__init__.py", line 72, in load
    return loader.get_single_data()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/constructor.py", line 35, in get_single_data
    node = self.get_single_node()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/yaml/composer.py", line 43, in get_single_node
    event.start_mark)
yaml.composer.ComposerError: expected a single document in the stream
  in "<unicode string>", line 2, column 1:
    First Name: John
    ^
but found another document
  in "<unicode string>", line 5, column 1:
    ---
    ^

Could you help me, please?
Have I wrote the code in the wrong way, or does this means that PyYAML can't handle YAML front matter blocks?
Is there anything else I could try to do with PyYAML, or do I have to write my own parser by using regex ?

Thank you very much for your time !

回答1:

The Python yaml library does not support reading yaml that is embedded in a document. Here is a utility function that extracts the yaml text, so you can parse it before reading the remainder of the file:

#!/usr/bin/python2.7

import yaml
import sys

def get_yaml(f):
  pointer = f.tell()
  if f.readline() != '---\n':
    f.seek(pointer)
    return ''
  readline = iter(f.readline, '')
  readline = iter(readline.next, '---\n')
  return ''.join(readline)


for filename in sys.argv[1:]:
  with open(filename) as f:
    config = yaml.load(get_yaml(f))
    text = f.read()
    print "TEXT from", filename
    print text
    print "CONFIG from", filename
    print config


回答2:

You can accomplish this without any custom parsing by calling yaml.load_all() instead. This will return a generator of which the first item is the expected front matter as a dict, and the second is the rest of the document as a string:

import yaml

with open('some-file-with-front-matter.md') as f:
    front_matter, content = list(yaml.load_all(f))[:2]

If you just want the front matter it's even simpler:

import yaml

with open('some-file-with-front-matter.md') as f:
    front_matter = next(yaml.load_all(f))

This works because yaml.load_all() is for loading several YAML documents within the same document, delimited by ---. Also, make sure you take the usual precautions when loading YAML from an unknown source.