How to combine parsers with different types of Ele

2019-04-10 12:11发布

I'm trying to create a parser that combines Regex parsers and a custom parser I have. I've looked at Scala: How to combine parser combinators from different objects, but that question and answers deal with parsers that have the same type of Elem.

Say I have a few RegexParsers, and also a parser that does a lookup for a String:

trait NumbersParsers extends RegexParsers {
  def number = """\d+""".r
}

trait LookupParsers extends Parsers {
  type Elem = String
  def word = elem("word", (potential:String) => dictionary.exists(_.equals(x))
}

If I combine these parsers naively

object MyParser extends NumbersParsers with RegexParsers {
  def quantitive = number ~ word
}

I obviously get type errors because of the different types of Elem. How do I combine these parsers?

1条回答
Deceive 欺骗
2楼-- · 2019-04-10 12:45

I feel somewhat responsible for answering this one since I asked and answered Scala: How to combine parser combinators from different objects.

The quick answer would be, you can't combine different types of Elem. A different and elegant way to solve this problem uses ^? to augment a regex parser with extra filtering.

It might be helpful to read up on Combinator Parsing in Programming in Scala:

Parser Input

Sometimes, a parser reads a stream of tokens instead of a raw sequence of characters. A separate lexical analyzer is then used to convert a stream of raw characters into a stream of tokens. The type of parser inputs is defined as follows:

type Input = Reader[Elem]   

The class Reader comes from the package scala.util.parsing.input. It is similar to a Stream, but also keeps track of the positions of all the elements it reads. The type Elem represents individual input elements. It is an abstract type member of the Parsers trait:

type Elem

This means that subclasses and subtraits of Parsers need to instantiate class Elem to the type of input elements that are being parsed. For instance, RegexParsers and JavaTokenParsers fix Elem to be equal to Char.

So Elem is used by the lexical analyzer, which is responsible for chopping up your input stream into the smallest possible tokens that the parser wants to deal with. Since you want to deal with regular expression, your Elem is Char.

But don't worry. Just because your lexer gives you Chars that doesn't mean your parser is stuck with them too. What RegexParsers gives you is an implicit converter from a regex to Parser[String]. You can further convert them using ^^ operator (fully maps input) and ^? operator (partially maps input).

Let's incorporate them into your parsers:

import scala.util.parsing.combinator._

scala> val dictionary = Map("Foo" -> "x")
dictionary: scala.collection.immutable.Map[String,String] = Map(Foo -> x)

scala> trait NumbersParsers extends RegexParsers {
     |   def number: Parser[Int] = """\d+""".r ^^ { _.toInt }
     | }
defined trait NumbersParsers

scala> trait LookupParsers extends RegexParsers {
     |   def token: Parser[String] = """\w+""".r
     |   def word =
     |     token ^? ({
     |       case x if dictionary.contains(x) => x
     |     }, {
     |       case s => s + " is not found in the dictionary!"
     |     })
     | }
defined trait LookupParsers

scala> object MyParser extends NumbersParsers with LookupParsers {
     |   def quantitive = number ~ word
     |   
     |   def main(args: Array[String]) {
     |     println(parseAll(quantitive, args(0) ))
     |   }
     | }
defined module MyParser

scala> MyParser.main(Array("1 Foo"))
[1.6] parsed: (1~Foo)

scala> MyParser.main(Array("Foo"))
[1.1] failure: string matching regex `\d+' expected but `F' found

Foo
^

scala> MyParser.main(Array("2 Bar"))
[1.6] failure: Bar is not found in the dictionary!

2 Bar
     ^
查看更多
登录 后发表回答