Getting symbols with Lark parsing

2020-07-13 10:25发布

问题:

I'm trying to parse a little pseudo-code I'm writing and having some trouble getting values for symbols. It parses successfully, but it won't return a value the same as it would with "regular" characters. Here's an example:

>>> from lark import Lark
>>> parser = Lark('operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator")
>>> parsed = parser.parse(">")
>>> parsed
Tree(operator, [])
>>> parsed.data
'operator'
>>> parsed.value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Tree' object has no attribute 'value'

Why wouldn't there be a value? Is there another way to get the exact operator that was used?

回答1:

Author of Lark here. Mike's answer is accurate, but a better way to get the same result is by using the "!" prefix on the rule:

>>> from lark import Lark
>>> parser = Lark('!operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator")
>>> parser.parse(">")
Tree(operator, [Token(__MORETHAN, '>')])


回答2:

It appears that by default it removes "tokens"(or what it considered 'punctuation' marks. Luckily, there is an option to change that behavior called keep_all_tokens.

Here's an example with that option:

>>> from lark import Lark
>>> parser = Lark('operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator", keep_all_tokens=True)
>>> parsed = parser.parse(">")
>>> parsed
Tree(operator, [Token(__MORETHAN, '>')])
>>> parsed.children[0].value
'>'