How to define Pascal variables in PetitParser

2019-06-21 03:45发布

Here is the (simplified) EBNF section I'm trying to implement in PetitParser:

variable :: component / identifier
component :: indexed / field
indexed :: variable , $[ , blah , $]
field :: variable , $. , identifier

What I did was to add all these productions (except identifier) as ivars of my subclass of PPCompositeParser and define the corresponding methods as follows:

variable
  ^component / self identifier

component
  ^indexed / field

identifier
  ^(#letter asParser, (#word asParser) star) flatten

indexed
  ^variable , $[ asParser, #digit asParser, $] asParser

field
  ^variable , $. asParser, self identifier

start
  ^variable

Finally, I created a new instance of my parser and sent to it the message parse: 'a.b[0]'.

The problem: I get a stack overflow.

2条回答
三岁会撩人
2楼-- · 2019-06-21 04:08

The grammar has a left recursion: variable -> component -> indexed -> variable. PetitParser uses Parsing Expression Grammars (PEGs) that cannot handle left recursion. A PEG parser always takes the left option until it finds a match. In this case it will not find a match due to the left recursion. To make it work you need to first eliminate left recursion. Eliminating all left recursion could be more tricky as you will also get one through field after eliminating the first. For example, you can write the grammar as follows to make the left recursion more obvious:

variable = (variable , $[ , blah , $]) | (variable , $. , identifier) | identifier

If you have a left recursion like:

A  -> A a |  b

you can eliminate it like (e is an empty parser)

A  -> b A'
A' -> a A' | e

You'll need to apply this twice to get rid of the recursion. Alternatively you can choose to simplify the grammar if you do not want to parse all possible combinations of identifiers.

查看更多
成全新的幸福
3楼-- · 2019-06-21 04:15

The problem is that your grammar is left recursive. PetitParser uses a top-down greedy algorithm to parse the input string. If you follow the steps, you'll see that it goes from start then variable -> component -> indexed -> variable. This is becomes a loop that gets executed infinitely without consuming any input, and is the reason of the stack overflow (that is the left-recursiveness in practice).

The trick to solve the situation is to rewrite the parser by adding intermediate steps to avoid left-recursing. The basic idea is that the rewritten version will consume at least one character in each cycle. Let's start by simplifying a bit the parser refactoring the non-recursive parts of ´indexed´ and ´field´, and moving them to the bottom.

variable
  ^component, self identifier

component
  ^indexed / field

indexed
  ^variable, subscript

field
  ^variable, fieldName

start
  ^variable


subscript
    ^$[ asParser, #digit asParser, $] asParser

fieldName
    ^$. asParser, self identifier

identifier
  ^(#letter asParser, (#word asParser) star) flatten

Now you can more easily see (by following the loop) that if the recursion in variable is to end, an identifier has to be found at the beginning. That's the only way to start, and then comes more input (or ends). Let's call that second part variable':

variable
    ^self identifier, variable'

now the variable' actually refers to something with the identifier consumed, and we can safely move the recusion from the left of indexed and field to the right in variable':

variable'
    component', variable' / nil asParser

component'
    ^indexed' / field'

indexed'
    ^subscript

field'
    ^fieldName

I've written this answer without actually testing the code, but should be okish. The parser can be further simplified, I leave that as an excercise ;).

For more information on left-recursion elimination you can have a look at left recursion elimination

查看更多
登录 后发表回答