Returning Error for Invalid Parse inside of `rep`

2019-07-24 23:01发布

问题:

In the following DSL, I'm successfully parsing "foo", followed by 0 or more repititions of conj ~ noun.

object Foo extends JavaTokenParsers { 

  def word(x: String) = s"\\b$x\\b".r

  lazy val expr  = word("foo") ~ rep(conj ~ noun)

  val noun   = word("noun")
  val conj   = word("and") | err("not a conjunction!")

}

credit: Thanks to Travis Brown for explaining the need for the word function here.

It looks good when testing out an invalid conjunction.

scala> Foo.parseAll(Foo.expr, "foo an3 noun")
res29: Foo.ParseResult[Foo.~[String,List[Foo.~[java.io.Serializable,String]]]] =
[1.5] error: not a conjunction!

foo an3 noun
    ^

But, another test shows that it's not working - foo and noun should succeed.

scala> Foo.parseAll(Foo.expr, "foo and noun")
res31: Foo.ParseResult[Foo.~[String,List[Foo.~[java.io.Serializable,String]]]] =
[1.13] error: not a conjunction!

foo and noun
            ^

Since this passed-in String consists only of foo and noun, I'm not sure what other characters/tokens are being read.

I had replaced the above err with failure, but that's no good either:

scala> Foo.parseAll(Foo.expr, "foo a3nd noun")
res32: Foo.ParseResult[Foo.~[String,List[Foo.~[java.io.Serializable,String]]]] =
[1.5] failure: string matching regex `\z' expected but `a' found

foo a3nd noun
    ^

I believe that Parsers#rep explains the last failure message:

def rep[T](p: => Parser[T]): Parser[List[T]] = rep1(p) | success(List())

Based on this excellent answer, my understanding is that rep1(p) (where p is conj ~ noun) will fail, resulting in success(List()) (since failure allows back-tracking). However, I'm not entirely sure why success(List()) is not returned - the failure message says: failure: string matching regex '\z' expected but 'a'' found - it expected end of line.

回答1:

Let's go step by step through what happens when foo and noun is getting parsed:

  • word("foo") is tried, it matches and consumes foo from input,
  • rep is tried,
  • conj is tried,
    • word("and") is tried, it matches and consumes and from input,
    • so the second branch (err) isn't even tested,
  • word("noun") is tried, it matches and consumes noun from input,
  • rep starts looping:
    • word("and") is tried, it doesn't match,
    • so err is tried, and by its very definition, it returns an error, ending the parse here.

You don't actually want err to be tested as soon as word("and") doesn't match, because it could not-match for a very good reason: that we have reached EOF.

So let's detect EOF and only try to parse conj if we have more input. Let's write a parser that does that:

def notEOF: Parser[Unit] = Parser { in =>
    if (in.atEnd) Failure("EOF", in) else Success((), in)
  }

And then:

val conj = notEOF ~> (word("and") | " *".r ~> err("not a conjunction!"))

On EOF, this returns a failure, so rep can stop looping and return with whatever it has. Otherwise, it tries to parse and and errs if not. Note that I use the " *".r trick to make sure err always win.