In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object
.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun
, but an invalid verb
, why won't the err("not a verb!")
return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word
function here.
This question seems similar, but I'm not sure how to handle err
with the ~
function.
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse
aSentence
, it's then going to trysomething
.The culprit should be obvious when you think about it: what in that source code is taking two
Failure
results and choosing one?|
(akaappend
).This method on
Parser
will feed the input to both parsers, and then callappend
onParseResult
. That method is abstract at that level, and defined onSuccess
,Failure
andError
in different ways.On both
Success
andError
, it always takethis
(that is, the parser on the left). OnFailure
, though, it does something else:Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing
FOO
), but if both have read the same amount, it will give precedence to the second failure.I do wonder if it shouldn't check whether the right side is an
Error
, and, if so, return that. After all, if the left side is anError
, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.Back to the problem, it would seem that it should have gone with
err
, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, includingerr
.That means that
err
's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:Arguably,
err
ought to be overridden byRegexParsers
to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).