I need to parse, using Megaparsec a data structure like
data Foo
= Simple String
| Dotted Foo String
where I can have alphanumeric strings separated by dots.
For example abc
should be parsed to Simple "abc"
and abc.def
to Dotted (Simple "abc") "def"
.
My parser at the moment is like
fooParser :: Parser Foo
fooParser
= Simple <$> alphaNum
<|> do
foo <- fooParser
constant "."
end <- alphaNum
pure $ Dotted foo end
This works fine for Simple
, but it does not parse any Dotted
, because the first option always succeeds parsing the first piece of the string.
Which is the best option to fix my parser?
it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.
That problem can be fixed easily enough by changing the order of your alternatives. Generally, whenever you have an alternative that can always match, that alternative must come last.
However this will only lead you to your next problem: Your Dotted
parser is left-recursive, which parsec does not support, meaning it will lead to an infinite recursion once its actually reached.
Generally when we want to use a left-recursive grammar with a parsing algorithm that does not handle left-recursion, we replace the recursion with repetition and then perform a left-fold on the resulting list. So given the original grammar:
foo ::= alphanum
| foo "." alphanum
We can rewrite it using repetition like this:
foo ::= alphanum ("." alphanum)*
Now the most direct translation to Parsec would use many
for the *
and then left-fold the resulting list. However, we might notice that the pattern rule ("seperator" rule)*
can be more simply matched by sepBy1
. So this gives us:
fooParser =
do
first : rest <- sepBy1 alphanum $ constant "."
return $ foldl Dotted (Simple first) rest