Let's have following parser:
parser :: GenParser Char st String
parser = choice (fmap (try . string) ["head", "tail", "tales"]
<?> "expected one of ['head', 'tail', 'tales']")
When we parse the malformed input "ta" it will return the defined error but because of backtracking it will also talk about unexpected "t"
at first position instead of unexpected " "
at position 3.
Is there an easy (or built-in) way of matching one of multiple expected strings that produces good error messages? I am talking about showing the correct position and in this case something like expected "tail" or "tales"
instead of our hard-coded error message.
Old answer for old nonworking example
Which version of
parsec
do you have installed? 3.1.9 does this for me:The added
<?> error_message
doesn't change anything except that it changes that last line toexpecting expected one of ['foo', 'fob', 'bar']
.How to extract more errors out of Parsec
So this is one of those cases where you shouldn't trust the error message to be exhaustive about the information that is available in the system. Let me give a funky
Show
instance forText.Parsec.Error:Message
(which is basically what it would be if it werederiving (Show)
) so that you can see what's coming out of Parsec:You can see that secretly
choice
is dumping all of its information into a bunch of parallel messages, and storing "unexpected end-of-file" asSysUnExpect ""
. Theshow
instance forParseError
apparently grabs the firstSysUnExpect
but all of theExpect
messages and dumps them for you to see.The exact function which does this at present is Text.Parsec.Error:showErrorMessages. The error messages are expected to be in order and are broken into 4 chunks based on the constructor; the
SysUnExpect
chunk is sent through a special display function which hides the text completely if there are bona-fideUnExpect
elements or else shows only the firstSysUnExpect
message:It may be worth rewriting this or sending a bug upstream, as this is kinda weird behavior, and the data structures don't quite suit them. First, your problem in a nutshell is: it seems like each
Message
should have aSourcePos
, not each ParseError.So, there is an earlier step,
mergeErrors
, which prefers ParseErrors with laterSourcePos
-es. This doesn't fire because messages don't have aSourcePos
, which means that all of the errors fromchoice
start at the beginning of the string rather than at the maximal point matched. You can see this for example in how this doesn't get stuck on parsing"tai"
:Second, apart from that, probably we should bind together messages that go together (so the default message is
unexpected 't', expected "heads" | unexpected end-of-file, expected 'tails' | unexpected end-of-file, expected 'tales'
unless you override it with<?>
). Third, probably the ParseError constructor should be exported; fourth, the enumerated type inMessage
is really ugly and might be better put intoParseError {systemUnexpected :: [Message], userUnexpected :: [Message], expected :: [Message], other :: [Message]}
or something, even in its present incarnation. (For example, the currentShow
forParseError
will break subtly if the messages aren't in a certain order.)In the meantime I would recommend writing your own
show
variant forParseError
.It's not hard to cook up a function which does this correctly. We'll just rip one character off at a time, using
Data.Map
to find the shared suffixes:We can verify in ghci that it succesfully matches
"tail"
and"tales"
, and that it asks fori
orl
after a failed parse starting withta
:Here's what I've got with Parsec:
If you care to try modern version of Parsec — Megaparsec, you will end up with:
What's going on here? First, when we parse ordered collection of characters, like with
string
, we display incorrect input completely. This is much better in our opinion because:We're pointing to beginning of the word and we show the whole thing that is not correct (up to first mismatching character) and the whole thing we expect. This is more readable in my opinion. Only parsers built on
tokens
work this way (that is, when we're trying to match fixed string, case-insensitive variant is available).Then, what about
unexpected "ta" or 't'
, why do we get't'
part? This is also absolutely correct, because with your collection of alternatives, the first letter't'
can be also unexpected by itself because you have an alternative that doesn't start with't'
. Let's see another example:Or how about:
Parsec:
Why take pains to make it work when it can “just work”?
There are many other great things about Megaparsec, if you're interested, you can learn more about it here. It's hard to compete with Parsec, but we have written our own tutorials and our docs are very good.