Priority in grammar using Lark

2019-07-19 09:03发布

问题:

I have a priority problem in my grammar, and I don't have any more idea to fix it.

I'm using Lark

Here is the thing (I have simplified the problem as much as I can):

from lark import Lark

parser = Lark(r"""
    start: set | set_mul

    set_mul: [nb] set
    set: [nb] "foo"
    nb: INT "x"

   %import common.INT
   %import common.WS
   %ignore WS

   """, start='start')

input = "3xfoo"
p = parser.parse(input)
print(p.pretty())

The output is :

  start
  set_mul
    set
      nb    3

But what I want is :

start
  set_mul
     nb 3
     set

I tried to put priority in my rules, but it's not working.

Do you have any idea of what I would need to change to make it work ?

Thanks

回答1:

A simple solution might be to re-write your grammar to remove the ambiguity.

parser = Lark(r"""
    start: set | set_mul

    set_mul: nb | nb set | nb nb_set
    set: "foo"
    nb_set: nb set
    nb: INT "x"

   %import common.INT
   %import common.WS
   %ignore WS

   """, start='start')

This way, each of the following inputs has only one possible interpretation:

input = "3xfoo"
p = parser.parse(input)
print(p.pretty())

input = "3x4xfoo"
p = parser.parse(input)
print(p.pretty())         

Result:

start
  set_mul
    nb  3
    set

start
  set_mul
    nb  3
    nb_set
      nb    4
      set


回答2:

This is not a full answer, but gets you part way I hope. Your problem is that your grammar is ambiguous and the example you use hits that ambiguity head-on. Lark chooses to disambiguate for you, and you get the result you. see.

Make Lark not disambiguate, like this by adding ambiguity='explicit':

import lark

parser = lark.Lark(r"""
    start: set | set_mul

    set_mul: [nb] set
    set: [nb] "foo"
    nb: INT "x"

   %import common.INT
   %import common.WS
   %ignore WS

   """, start='start',ambiguity='explicit')

input = "3xfoo"
p = parser.parse(input)
print(p.pretty())

and you get this output which includes the one you want:

_ambig
  start
    set
      nb        3
  start
    set_mul
      set
        nb      3
  start
    set_mul
      nb        3
      set

How can you encourage Lark to disambiguate to your preferred out? Good question.