Scala Parser Issues

2019-03-16 15:10发布

I am having issues testing out the Scala Parser Combinator functionality for a simple Book DSL.

Firstly there is a book class:

case class Book (name:String,isbn:String) {
def getNiceName():String = name+" : "+isbn
}

Next, there is the simple parser:

object BookParser extends StandardTokenParsers {
  lexical.reserved += ("book","has","isbn")

  def bookSpec  = "book" ~> stringLit ~> "has" ~> "isbn" ~> stringLit ^^ {
            case "book" ~ name ~ "has" ~ "isbn" ~ isbn => new Book(name,isbn) }

  def parse (s: String) = {
    val tokens = new lexical.Scanner(s)
    phrase(bookSpec)(tokens)
  }

  def test (exprString : String) = {
     parse (exprString) match {
         case Success(book) => println("Book"+book.getNiceName())
     }
  }

  def main (args: Array[String]) = {
     test ("book ABC has isbn DEF")
  }   
}

I'm getting a range of errors trying to compile this - some which seem a strange to me when trying to deconstruct the other examples on the internet. For example, the bookSpec function appears nearly identical to the other examples?

Is this the best way to build a simple parser like this?

Thanks

标签: scala parsing
2条回答
乱世女痞
2楼-- · 2019-03-16 15:37

You're on the right track. There are a few issues in your parser. I'll post the corrected code, then explain the changes.

import scala.util.parsing.combinator._
import scala.util.parsing.combinator.syntactical._

case class Book (name: String, isbn: String) {
  def niceName = name + " : " + isbn
}


object BookParser extends StandardTokenParsers {
  lexical.reserved += ("book","has","isbn")

  def bookSpec: Parser[Book]  = "book" ~ ident ~ "has" ~ "isbn" ~ ident ^^ {
            case "book" ~ name ~ "has" ~ "isbn" ~ isbn => new Book(name, isbn) }

  def parse (s: String) = {
    val tokens = new lexical.Scanner(s)
    phrase(bookSpec)(tokens)
  }

  def test (exprString : String) = {
     parse (exprString) match {
       case Success(book, _) => println("Book: " + book.niceName)
       case Failure(msg, _) => println("Failure: " + msg)
       case Error(msg, _) => println("Error: " + msg)
     }
  }

  def main (args: Array[String]) = {
     test ("book ABC has isbn DEF")
  }   
}

1. Parser return value

In order to return a book from a parser, you need to give the type inferencer some help. I changed the definition of the bookSpec function to be explicit: it returns a Parser[Book]. That is, it returns an object which is a parser for books.

2. stringLit

The stringLit function you used comes from the StdTokenParsers trait. stringLit is a function that returns Parser[String], but the pattern it matches includes the double-quotes that most languages use to delimit a string literal. If you are happy with double-quoting words in your DSL, then stringLit is what you want. In the interest of simplicity, I replaced stringLit with ident. ident looks for a Java-language identifier. This isn't really the right format for ISBNs, but it did pass your test case. :-)

To match ISBNs correctly, I think you'll need to use a regex expression instead of idents.

3. Ignore-left sequence

Your matcher used a string of ~> combiners. This is a function that takes two Parser[_] objects and returns a Parser that recognizes both in sequence, then returns the result of the right hand side. By using a whole chain of them to lead up to your final stringLit, your parser would ignore everything except the final word in the sentence. That means it would throw away the book name, too.

Also, when you use ~> or <~, the ignored tokens should not appear in your pattern matching.

For simplicity, I changed these all to simple sequence functions and left the extra tokens in the pattern match.

4. Matching results

The test method needs to match all the possible results from the parse() function. So, I added the Failure() and Error() cases. Also, even Success includes both your return value and the Reader object. We don't care about the reader, so I just used "_" to ignore it in the pattern match.

Hope this helps!

查看更多
走好不送
3楼-- · 2019-03-16 15:51

When you use ~> or <~, you are discarding the element from which the arrow comes. For example:

"book" ~> stringLit // discards "book"
"book" ~> stringLit ~> "has" // discards "book" and then stringLit
"book" ~> stringLit ~> "has" ~> "isbn" // discards everything except "isbn"
"book" ~> stringLit ~> "has" ~> "isbn" ~> stringLit // discards everything but the last stringLit

You could write it like this:

def bookSpec: Parser[Book] = ("book" ~> stringLit <~ "has" <~ "isbn") ~ stringLit ^^ {
  case name ~ isbn => new Book(name,isbn) 
}
查看更多
登录 后发表回答