I need to parse, using Megaparsec a data structure like
data Foo
= Simple String
| Dotted Foo String
where I can have alphanumeric strings separated by dots.
For example abc
should be parsed to Simple "abc"
and abc.def
to Dotted (Simple "abc") "def"
.
My parser at the moment is like
fooParser :: Parser Foo
fooParser
= Simple <$> alphaNum
<|> do
foo <- fooParser
constant "."
end <- alphaNum
pure $ Dotted foo end
This works fine for Simple
, but it does not parse any Dotted
, because the first option always succeeds parsing the first piece of the string.
Which is the best option to fix my parser?
That problem can be fixed easily enough by changing the order of your alternatives. Generally, whenever you have an alternative that can always match, that alternative must come last.
However this will only lead you to your next problem: Your
Dotted
parser is left-recursive, which parsec does not support, meaning it will lead to an infinite recursion once its actually reached.Generally when we want to use a left-recursive grammar with a parsing algorithm that does not handle left-recursion, we replace the recursion with repetition and then perform a left-fold on the resulting list. So given the original grammar:
We can rewrite it using repetition like this:
Now the most direct translation to Parsec would use
many
for the*
and then left-fold the resulting list. However, we might notice that the patternrule ("seperator" rule)*
can be more simply matched bysepBy1
. So this gives us: