Recursive Parser

I need to parse, using Megaparsec a data structure like

data Foo
    = Simple String
    | Dotted Foo String

where I can have alphanumeric strings separated by dots.

For example abc should be parsed to Simple "abc" and abc.def to Dotted (Simple "abc") "def".

My parser at the moment is like

fooParser :: Parser Foo
fooParser
    = Simple <$> alphaNum
    <|> do
        foo <- fooParser
        constant "."
        end <- alphaNum
        pure $ Dotted foo end

This works fine for Simple, but it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.

Which is the best option to fix my parser?

标签： parsing haskell megaparsec

1条回答

家丑人穷心不美

2楼-- · 2019-06-23 20:07

it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.

That problem can be fixed easily enough by changing the order of your alternatives. Generally, whenever you have an alternative that can always match, that alternative must come last.

However this will only lead you to your next problem: Your Dotted parser is left-recursive, which parsec does not support, meaning it will lead to an infinite recursion once its actually reached.

Generally when we want to use a left-recursive grammar with a parsing algorithm that does not handle left-recursion, we replace the recursion with repetition and then perform a left-fold on the resulting list. So given the original grammar:

foo ::= alphanum
      | foo "." alphanum

We can rewrite it using repetition like this:

foo ::= alphanum ("." alphanum)*

Now the most direct translation to Parsec would use many for the * and then left-fold the resulting list. However, we might notice that the pattern rule ("seperator" rule)* can be more simply matched by sepBy1. So this gives us:

fooParser =
  do
    first : rest <- sepBy1 alphanum $ constant "."
    return $ foldl Dotted (Simple first) rest

0人赞添加讨论(0) 举报

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间