Why is PyYAML spending so much time in just parsin

2019-03-19 06:18发布

I am parsing a YAML file with around 6500 lines with this format:

foo1:
  bar1:
    blah: { name: "john", age: 123 }
  metadata: { whatever1: "whatever", whatever2: "whatever" }
  stuff:
    thing1: 
      bluh1: { name: "Doe1", age: 123 }
      bluh2: { name: "Doe2", age: 123 }
    thing2:
    ...
    thingN:
foo2:
...
fooN:

I just want to parse it with the PyYAML library (I think there is no more alternatives to it in Python: How can I parse a YAML file in Python).

Just for testing, I write that code to parse my file:

import yaml

config_file = "/path/to/file.yaml"

stream = open(config_file, "r")
sensors = yaml.load(stream)

Executing the script with time command along with the script I get this time:

real    0m3.906s
user    0m3.672s
sys     0m0.100s

That values doesn't seem too good really. I just want to test the same with JSON, just converting the same YAML file to JSON first:

import json

config_file = "/path/to/file.json"

stream = open(config_file, "r")
sensors = json.load(stream)  # We read the yaml config file

But the execution time is far better:

real    0m0.058s
user    0m0.032s
sys     0m0.008s

Why is the main reason that PyYAML spends more time parsing the YAML file than parsing the JSON one? Is it a problem of PyYAML or is it because of the YAML format is hard to parse? (probably is the first one)

EDIT:

I add another example with ruby and YAML:

require 'yaml'

sensors = YAML.load_file('/path/to/file.yaml')

And the execution time is good! (or at least not as bad as the PyYAML example):

real    0m0.278s
user    0m0.240s
sys     0m0.032s

1条回答
时光不老,我们不散
2楼-- · 2019-03-19 07:11

According to the docs you must use CLoader/CSafeLoader (and CDumper):

import yaml
try:
    from yaml import CLoader as Loader
except ImportError:
    from yaml import Loader

config_file = "test.yaml"

stream = open(config_file, "r")
sensors = yaml.load(stream, Loader=Loader)

This gives me

real    0m0.503s

instead of

real    0m2.714s
查看更多
登录 后发表回答