In the following DSL, I'm successfully parsing "foo", followed by 0 or more repititions of conj ~ noun
.
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = word("foo") ~ rep(conj ~ noun)
val noun = word("noun")
val conj = word("and") | err("not a conjunction!")
}
credit: Thanks to Travis Brown for explaining the need for the word
function here.
It looks good when testing out an invalid conjunction.
scala> Foo.parseAll(Foo.expr, "foo an3 noun")
res29: Foo.ParseResult[Foo.~[String,List[Foo.~[java.io.Serializable,String]]]] =
[1.5] error: not a conjunction!
foo an3 noun
^
But, another test shows that it's not working - foo and noun
should succeed.
scala> Foo.parseAll(Foo.expr, "foo and noun")
res31: Foo.ParseResult[Foo.~[String,List[Foo.~[java.io.Serializable,String]]]] =
[1.13] error: not a conjunction!
foo and noun
^
Since this passed-in String consists only of foo and noun
, I'm not sure what other characters/tokens are being read.
I had replaced the above err
with failure
, but that's no good either:
scala> Foo.parseAll(Foo.expr, "foo a3nd noun")
res32: Foo.ParseResult[Foo.~[String,List[Foo.~[java.io.Serializable,String]]]] =
[1.5] failure: string matching regex `\z' expected but `a' found
foo a3nd noun
^
I believe that Parsers#rep
explains the last failure
message:
def rep[T](p: => Parser[T]): Parser[List[T]] = rep1(p) | success(List())
Based on this excellent answer, my understanding is that rep1(p)
(where p is conj ~ noun
) will fail, resulting in success(List())
(since failure allows back-tracking). However, I'm not entirely sure why success(List())
is not returned - the failure message says: failure: string matching regex '\z' expected but 'a'' found
- it expected end of line.
Let's go step by step through what happens when
foo and noun
is getting parsed:word("foo")
is tried, it matches and consumesfoo
from input,rep
is tried,conj
is tried,word("and")
is tried, it matches and consumesand
from input,err
) isn't even tested,word("noun")
is tried, it matches and consumesnoun
from input,rep
starts looping:word("and")
is tried, it doesn't match,err
is tried, and by its very definition, it returns an error, ending the parse here.You don't actually want
err
to be tested as soon asword("and")
doesn't match, because it could not-match for a very good reason: that we have reached EOF.So let's detect EOF and only try to parse conj if we have more input. Let's write a parser that does that:
And then:
On
EOF
, this returns a failure, sorep
can stop looping and return with whatever it has. Otherwise, it tries to parseand
and errs if not. Note that I use the" *".r
trick to make sureerr
always win.