How can I create a parser combinator in which line

2019-02-06 00:19发布

I am creating a DSL, and using Scala's parser combinator library to parse the DSL. The DSL follows a simple, Ruby-like syntax. A source file can contain a series of blocks that look like this:

create_model do
  at 0,0,0
end

Line endings are significant in the DSL, as they are effectively used as statement terminators.

I wrote a Scala parser that looks like this:

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = ident
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = ident
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}

Since line endings matter, I overrode whiteSpace so that it'll only treat spaces and tabs as whitespace (instead of treating new lines as whitespace, and thus ignoring them).

This works, except for the "end" statement for commandBlock. Since my source file contains a trailing new line, the parser complains that it was expecting just an end but got a new line after the end keyword.

So I changed commandBlock's definition to this:

def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)

(That is, I added an optional new line after "end").

But now, when parsing the source file, I get the following error:

[4.1] failure: `end' expected but `' found

I think this is because, after it sucks it the trailing new line, the parser is encountering an empty string which it thinks is invalid, but I'm not sure why it's doing this.

Any tips on how to fix this? I might extending the wrong parser from Scala's parser combinator library, so any suggestions on how to create a language definition with significant new line characters is also welcome.

2条回答
爷的心禁止访问
2楼-- · 2019-02-06 01:11

You can either override the protected val whiteSpace (a Regex) whose default is """\s+""".r or override the protected def handleWhiteSpace(...) method if you need more control than is readily achieved with a regular expression. Both these members orginate in RegexParsers, which is the base class for JavaTokenParsers.

查看更多
Ridiculous、
3楼-- · 2019-02-06 01:14

I get the same error in both ways, but I think you are misinterpreting it. What it's saying is that it is expecting an end, but it already reached the end of the input.

And the reason that is happening is that end is being read as a statement. Now, I'm sure there's a nice way to solve this, but I'm not experienced enough with Scala parsers. It seems the way to go would be to use token parsers with a scanning part, but I couldn't figure a way to make the standard token parser not treat newlines as whitespace.

So, here's an alternative:

import scala.util.parsing.combinator.JavaTokenParsers

class ML3D extends JavaTokenParsers {
  override val whiteSpace = """[ \t]+""".r
  def keywords: Parser[Any] = "do" | "end"
  def identifier: Parser[Any] = not(keywords)~ident

  def model: Parser[Any] = commandList
  def commandList: Parser[Any] = rep(commandBlock)
  def commandBlock: Parser[Any] = command~"do"~eol~statementList~"end"~opt(eol)
  def eol: Parser[Any] = """(\r?\n)+""".r
  def command: Parser[Any] = commandName~opt(commandLabel)
  def commandName: Parser[Any] = identifier
  def commandLabel: Parser[Any] = stringLiteral
  def statementList: Parser[Any] = rep(statement)
  def statement: Parser[Any] = functionName~argumentList~eol
  def functionName: Parser[Any] = identifier
  def argumentList: Parser[Any] = repsep(argument, ",")
  def argument: Parser[Any] = stringLiteral | constant
  def constant: Parser[Any] = wholeNumber | floatingPointNumber
}
查看更多
登录 后发表回答