I am experimenting with parser combinators and I often run into what seems like infinite recursions. Here is the first one I ran into:
import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader
class CombinatorParserTest extends Parsers {
type Elem = Char
def notComma = elem("not comma", _ != ',')
def notEndLine = elem("not end line", x => x != '\r' && x != '\n')
def text = rep(notComma | notEndLine)
}
object CombinatorParserTest {
def main(args:Array[String]): Unit = {
val p = new CombinatorParserTest()
val r = p.text(new CharSequenceReader(","))
// does not get here
println(r)
}
}
How can I print what is going on? And why does this not finish?
Logging the attempts to parse notComma
and notEndLine
show that it is the end-of-file (shown as a CTRL-Z in the log(...)("mesg") output) that is being repeatedly parsed. Here's how I modified your parser for this purpose:
def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))
I'm not entirely sure what's going on (I tried many variations on your grammar), but I think it's something like this: The EOF is not really a character artificially introduced into the input stream, but rather a sort of perpetual condition at the end of the input. Thus this never-consumed EOF pseudo-character is repeatedly parsed as "either not a comma or not an end-of-line."
Ok, I think I've figured this out. `CharSequenceReader returns '\032' as a marker for the end of the input. So if I modify my input like this, it works:
import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader
class CombinatorParserTest extends Parsers {
type Elem = Char
import CharSequenceReader.EofCh
def notComma = elem("not comma", x => x != ',' && x!=EofCh)
def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)
//def text = rep(notComma | notEndLine)
def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))
}
object CombinatorParserTest {
def main(args:Array[String]): Unit = {
val p = new CombinatorParserTest()
val r = p.text(new CharSequenceReader(","))
println(r)
}
}
See source code for CharSequenceReader
here. If the scaladoc mentioned it, it would have saved me a lot of time.
I find the logging function is extremely awkward to type. Like why do I have to do log(parser)("string")
? Why not have something as simple as parser.log("string")
?. Anyways, to overcome that, I made this instead:
trait Logging { self: Parsers =>
// Used to turn logging on or off
val debug: Boolean
// Much easier than having to wrap a parser with a log function and type a message
// i.e. log(someParser)("Message") vs someParser.log("Message")
implicit class Logged[+A](parser: Parser[A]) {
def log(msg: String): Parser[A] =
if (debug) self.log(parser)(msg) else parser
}
}
Now in your parser, you can mix-in this trait like so:
import scala.util.parsing.combinator.Parsers
import scala.util.parsing.input.CharSequenceReader
object CombinatorParserTest extends App with Parsers with Logging {
type Elem = Char
override val debug: Boolean = true
def notComma: Parser[Char] = elem("not comma", _ != ',')
def notEndLine: Parser[Char] = elem("not end line", x => x != '\r' && x != '\n')
def text: Parser[List[Char]] = rep(notComma.log("notComma") | notEndLine.log("notEndLine"))
val r = text(new CharSequenceReader(","))
println(r)
}
You can also override the debug
field to turn off the logging if so desired.
Running this also shows the second parser correctly parsed the comma:
trying notComma at scala.util.parsing.input.CharSequenceReader@506e6d5e
notComma --> [1.1] failure: not comma expected
,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@506e6d5e
notEndLine --> [1.2] parsed: ,
trying notComma at scala.util.parsing.input.CharSequenceReader@15975490
notComma --> [1.2] failure: end of input
,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@15975490
notEndLine --> [1.2] failure: end of input
,
^
The result is List(,)
Process finished with exit code 0