I'm trying to create a parser that combines Regex parsers and a custom parser I have. I've looked at Scala: How to combine parser combinators from different objects, but that question and answers deal with parsers that have the same type of Elem
.
Say I have a few RegexParsers, and also a parser that does a lookup for a String:
trait NumbersParsers extends RegexParsers {
def number = """\d+""".r
}
trait LookupParsers extends Parsers {
type Elem = String
def word = elem("word", (potential:String) => dictionary.exists(_.equals(x))
}
If I combine these parsers naively
object MyParser extends NumbersParsers with RegexParsers {
def quantitive = number ~ word
}
I obviously get type errors because of the different types of Elem
. How do I combine these parsers?
I feel somewhat responsible for answering this one since I asked and answered Scala: How to combine parser combinators from different objects.
The quick answer would be, you can't combine different types of Elem
. A different and elegant way to solve this problem uses ^?
to augment a regex parser with extra filtering.
It might be helpful to read up on Combinator Parsing in Programming in Scala:
Parser Input
Sometimes, a parser reads a stream of tokens instead of a raw sequence of characters. A separate lexical analyzer is then used to convert a stream of raw characters into a stream of tokens. The type of parser inputs is defined as follows:
type Input = Reader[Elem]
The class Reader comes from the package scala.util.parsing.input
. It is similar to a Stream, but also keeps track of the positions of all the elements it reads. The type Elem
represents individual input elements. It is an abstract type member of the Parsers
trait:
type Elem
This means that subclasses and subtraits of Parsers need to instantiate class Elem
to the type of input elements that are being parsed. For instance, RegexParsers
and JavaTokenParsers
fix Elem
to be equal to Char
.
So Elem
is used by the lexical analyzer, which is responsible for chopping up your input stream into the smallest possible tokens that the parser wants to deal with. Since you want to deal with regular expression, your Elem
is Char
.
But don't worry. Just because your lexer gives you Char
s that doesn't mean your parser is stuck with them too. What RegexParsers
gives you is an implicit converter from a regex to Parser[String]
. You can further convert them using ^^
operator (fully maps input) and ^?
operator (partially maps input).
Let's incorporate them into your parsers:
import scala.util.parsing.combinator._
scala> val dictionary = Map("Foo" -> "x")
dictionary: scala.collection.immutable.Map[String,String] = Map(Foo -> x)
scala> trait NumbersParsers extends RegexParsers {
| def number: Parser[Int] = """\d+""".r ^^ { _.toInt }
| }
defined trait NumbersParsers
scala> trait LookupParsers extends RegexParsers {
| def token: Parser[String] = """\w+""".r
| def word =
| token ^? ({
| case x if dictionary.contains(x) => x
| }, {
| case s => s + " is not found in the dictionary!"
| })
| }
defined trait LookupParsers
scala> object MyParser extends NumbersParsers with LookupParsers {
| def quantitive = number ~ word
|
| def main(args: Array[String]) {
| println(parseAll(quantitive, args(0) ))
| }
| }
defined module MyParser
scala> MyParser.main(Array("1 Foo"))
[1.6] parsed: (1~Foo)
scala> MyParser.main(Array("Foo"))
[1.1] failure: string matching regex `\d+' expected but `F' found
Foo
^
scala> MyParser.main(Array("2 Bar"))
[1.6] failure: Bar is not found in the dictionary!
2 Bar
^