I'm trying to parse a little pseudo-code I'm writing and having some trouble getting values for symbols. It parses successfully, but it won't return a value the same as it would with "regular" characters. Here's an example:
>>> from lark import Lark
>>> parser = Lark('operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator")
>>> parsed = parser.parse(">")
>>> parsed
Tree(operator, [])
>>> parsed.data
'operator'
>>> parsed.value
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Tree' object has no attribute 'value'
Why wouldn't there be a value? Is there another way to get the exact operator that was used?
Author of Lark here. Mike's answer is accurate, but a better way to get the same result is by using the "!" prefix on the rule:
>>> from lark import Lark
>>> parser = Lark('!operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator")
>>> parser.parse(">")
Tree(operator, [Token(__MORETHAN, '>')])
It appears that by default it removes "tokens"(or what it considered 'punctuation' marks. Luckily, there is an option to change that behavior called keep_all_tokens
.
Here's an example with that option:
>>> from lark import Lark
>>> parser = Lark('operator: "<" | ">" | "=" | ">=" | "<=" | "!="', start="operator", keep_all_tokens=True)
>>> parsed = parser.parse(">")
>>> parsed
Tree(operator, [Token(__MORETHAN, '>')])
>>> parsed.children[0].value
'>'