Scala combinator parser, what does >> mean?

2019-03-11 11:51发布

问题:

I am little bit confusing about ">>" in scala. Daniel said in Scala parser combinators parsing xml? that it could be used to parameterize the parser base on result from previous parser. Could someone give me some example/hint ? I already read scaladoc but still not understand it.

thanks

回答1:

As I said, it serves to parameterize a parser, but let's walk through an example to make it clear.

Let's start with a simple parser, that parses a number follow by a word:

def numberAndWord = number ~ word
def number        = "\\d+".r
def word          = "\\w+".r

Under RegexParsers, this will parse stuff like "3 fruits".

Now, let's say you also want a list of what these "n things" are. For example, "3 fruits: banana, apple, orange". Let's try to parse that to see how it goes.

First, how do I parse "N" things? As it happen, there's a repN method:

def threeThings = repN(3, word)

That will parse "banana apple orange", but not "banana, apple, orange". I need a separator. There's repsep that provides that, but that won't let me specify how many repetitions I want. So, let's provide the separator ourselves:

def threeThings = word ~ repN(2, "," ~> word)

Ok, that words. We can write the whole example now, for three things, like this:

def listOfThings = "3" ~ word ~ ":" ~ threeThings
def word         = "\\w+".r
def threeThings  = word ~ repN(2, "," ~> word)

That kind of works, except that I'm fixing "N" in 3. I want to let the user specify how many. And that's where >>, also known as into (and, yes, it is flatMap for Parser), comes into. First, let's change threeThings:

def things(n: Int) = n match {
  case 1          => word ^^ (List(_))
  case x if x > 1 => word ~ repN(x - 1, "," ~> word) ^^ { case w ~ l => w :: l }
  case x          => err("Invalid repetitions: "+x)
}

This is slightly more complicated than you might have expected, because I'm forcing it to return Parser[List[String]]. But how do I pass a parameter to things? I mean, this won't work:

def listOfThings = number ~ word ~ ":" ~ things(/* what do I put here?*/)

But we can rewrite that like this:

def listOfThings = (number ~ word <~ ":") >> {
  case n ~ what => things(n.toInt)
}

That is almost good enough, except that I now lost n and what: it only returns "List(banana, apple, orange)", not how many there ought to be, and what they are. I can do that like this:

def listOfThings   = (number ~ word <~ ":") >> {
  case n ~ what => things(n.toInt) ^^ { list => new ~(n.toInt, new ~(what, list)) }
}
def number         = "\\d+".r
def word           = "\\w+".r
def things(n: Int) = n match {
  case 1          => word ^^ (List(_))
  case x if x > 1 => word ~ repN(x - 1, "," ~> word) ^^ { case w ~ l => w :: l }
  case x          => err("Invalid repetitions: "+x)
}

Just a final comment. You might have wondered asked yourself "what do you mean flatMap? Isn't that a monad/for-comprehension thingy?" Why, yes, and yes! :-) Here's another way of writing listOfThings:

def listOfThings   = for {
  nOfWhat  <- number ~ word <~ ":"
  n ~ what = nOfWhat
  list     <- things(n.toInt)
}  yield new ~(n.toInt, new ~(what, list))

I'm not doing n ~ what <- number ~ word <~ ":" because that uses filter or withFilter in Scala, which is not implemented by Parsers. But here's even another way of writing it, that doesn't have the exact same semantics, but produce the same results:

def listOfThings   = for {
  n    <- number
  what <- word
  _    <- ":" : Parser[String]
  list <- things(n.toInt)
}  yield new ~(n.toInt, new ~(what, list))

This might even give one to think that maybe the claim that "monads are everywhere" might have something to it. :-)



回答2:

The method >> takes a function that is given the result of the parser and uses it to contruct a new parser. As stated, this can be used to parameterize a parser on the result of a previous parser.

Example

The following parser parses a line with n + 1 integer values. The first value n states the number of values to follow. This first integer is parsed and then the result of this parse is used to construct a parser that parses n further integers.

Parser definition

The following line assumes, that you can parse an integer with parseInt: Parser[Int]. It first parses an integer value n and then uses >> to parse n additional integers which form the result of the parser. So the initial n is not returned by the parser (though it's the size of the returned list).

def intLine: Parser[Seq[Int]] = parseInt >> (n => repN(n,parseInt))

Valid inputs

1 42
3 1 2 3
0

Invalid inputs

0 1
1
3 42 42