I am creating a DSL, and using Scala's parser combinator library to parse the DSL. The DSL follows a simple, Ruby-like syntax. A source file can contain a series of blocks that look like this:
create_model do
at 0,0,0
end
Line endings are significant in the DSL, as they are effectively used as statement terminators.
I wrote a Scala parser that looks like this:
class ML3D extends JavaTokenParsers {
override val whiteSpace = """[ \t]+""".r
def model: Parser[Any] = commandList
def commandList: Parser[Any] = rep(commandBlock)
def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
def eol: Parser[Any] = """(\r?\n)+""".r
def command: Parser[Any] = commandName~opt(commandLabel)
def commandName: Parser[Any] = ident
def commandLabel: Parser[Any] = stringLiteral
def statementList: Parser[Any] = rep(statement)
def statement: Parser[Any] = functionName~argumentList~eol
def functionName: Parser[Any] = ident
def argumentList: Parser[Any] = repsep(argument, ",")
def argument: Parser[Any] = stringLiteral | constant
def constant: Parser[Any] = wholeNumber | floatingPointNumber
}
Since line endings matter, I overrode whiteSpace
so that it'll only treat spaces and tabs as whitespace (instead of treating new lines as whitespace, and thus ignoring them).
This works, except for the "end" statement for commandBlock
. Since my source file contains a trailing new line, the parser complains that it was expecting just an end
but got a new line after the end
keyword.
So I changed commandBlock
's definition to this:
def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
(That is, I added an optional new line after "end").
But now, when parsing the source file, I get the following error:
[4.1] failure: `end' expected but `' found
I think this is because, after it sucks it the trailing new line, the parser is encountering an empty string which it thinks is invalid, but I'm not sure why it's doing this.
Any tips on how to fix this? I might extending the wrong parser from Scala's parser combinator library, so any suggestions on how to create a language definition with significant new line characters is also welcome.
You can either
override
theprotected val whiteSpace
(a Regex) whose default is"""\s+""".r
oroverride
theprotected def handleWhiteSpace(...)
method if you need more control than is readily achieved with a regular expression. Both these members orginate in RegexParsers, which is the base class for JavaTokenParsers.I get the same error in both ways, but I think you are misinterpreting it. What it's saying is that it is expecting an
end
, but it already reached the end of the input.And the reason that is happening is that
end
is being read as a statement. Now, I'm sure there's a nice way to solve this, but I'm not experienced enough with Scala parsers. It seems the way to go would be to use token parsers with a scanning part, but I couldn't figure a way to make the standard token parser not treat newlines as whitespace.So, here's an alternative: