How to skip whitespace but use it as a token delim

I am trying to build a small parser where the tokens (luckily) never contain whitespace. Whitespace (spaces, tabs and newlines) are essentially token delimeters (apart from cases where there are brackets etc.).

I am extending the RegexParsers class. If I turn on skipWhitespace the parser is greedily joining tokens together when the next token matches the regular expression of the previous one. If I turn off skipWhitespace, on the other hand, it complains because of the spaces not being part of the definition. I am trying to match the BNF as much as possible, and given that whitespace is almost always the delimeter (apart from brackets or some other cases where the delimeter is explicitly defined in the BNF), is there away to avoid putting whitespace regex in all my definitions?

UPDATE

This is a small test example where the tokens are being joined together:

import scala.util.parsing.combinator.RegexParsers

object TestParser extends RegexParsers {
  def test  = "(test" ~> name <~ ")"

  def name : Parser[String] = (letter ~ (anyChar*)) ^^ { case first ~ rest => (first :: rest).mkString}

  def anyChar = letter | digit | "_".r | "-".r
  def letter = """[a-zA-Z]""".r
  def digit = """\d""".r

  def main(args: Array[String]) {

    val s = "(test hello these should not be joined and I should get an error)"

    val res = parseAll(test, s)
    res match {
      case Success(r, n) => println(r)
      case Failure(msg, n) => println(msg)
      case Error(msg, n) => println(msg)
    }

  }

}

In the above case I just get the string joined together. A similar effect is if I change test to the following, expecting it to give me the list of separate words after test, but instead it joins them together and just gives me a one element list with a long string, without the middle spaces:

def test  = "(test" ~> (name+) <~ ")"

标签： scala parser-combinators

1条回答

Luminary・发光体

2楼-- · 2019-05-27 10:49

White space is skipped just before every production rule. So, in this snippet:

def name : Parser[String] = (letter ~ (anyChar*)) ^^ { case first ~ rest => (first :: rest).mkString}

It will skip whitespace before each letter and, even worse, each empty string for good measure (since anyChar* can be empty).

Use regular expressions (or plain strings) for each token, not each lexical element. Like this:

object TestParser extends RegexParsers {
  def test  = "(test" ~> name <~ ")"
  def name : Parser[String] = """[a-zA-Z][a-zA-Z0-9_-]*""".r

  // ...

0人赞添加讨论(0) 举报

How to skip whitespace but use it as a token delim

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间