Recursive Parser

2019-06-23 19:07发布

问题:

I need to parse, using Megaparsec a data structure like

data Foo
    = Simple String
    | Dotted Foo String

where I can have alphanumeric strings separated by dots.

For example abc should be parsed to Simple "abc" and abc.def to Dotted (Simple "abc") "def".

My parser at the moment is like

fooParser :: Parser Foo
fooParser
    = Simple <$> alphaNum
    <|> do
        foo <- fooParser
        constant "."
        end <- alphaNum
        pure $ Dotted foo end

This works fine for Simple, but it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.

Which is the best option to fix my parser?

回答1:

it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.

That problem can be fixed easily enough by changing the order of your alternatives. Generally, whenever you have an alternative that can always match, that alternative must come last.

However this will only lead you to your next problem: Your Dotted parser is left-recursive, which parsec does not support, meaning it will lead to an infinite recursion once its actually reached.

Generally when we want to use a left-recursive grammar with a parsing algorithm that does not handle left-recursion, we replace the recursion with repetition and then perform a left-fold on the resulting list. So given the original grammar:

foo ::= alphanum
      | foo "." alphanum

We can rewrite it using repetition like this:

foo ::= alphanum ("." alphanum)*

Now the most direct translation to Parsec would use many for the * and then left-fold the resulting list. However, we might notice that the pattern rule ("seperator" rule)* can be more simply matched by sepBy1. So this gives us:

fooParser =
  do
    first : rest <- sepBy1 alphanum $ constant "."
    return $ foldl Dotted (Simple first) rest