I am little bit confusing about ">>" in scala. Daniel said in Scala parser combinators parsing xml? that it could be used to parameterize the parser base on result from previous parser. Could someone give me some example/hint ? I already read scaladoc but still not understand it.
thanks
As I said, it serves to parameterize a parser, but let's walk through an example to make it clear.
Let's start with a simple parser, that parses a number follow by a word:
def numberAndWord = number ~ word
def number = "\\d+".r
def word = "\\w+".r
Under RegexParsers, this will parse stuff like "3 fruits".
Now, let's say you also want a list of what these "n things" are. For example, "3 fruits: banana, apple, orange". Let's try to parse that to see how it goes.
First, how do I parse "N" things? As it happen, there's a repN
method:
def threeThings = repN(3, word)
That will parse "banana apple orange", but not "banana, apple, orange". I need a separator. There's repsep
that provides that, but that won't let me specify how many repetitions I want. So, let's provide the separator ourselves:
def threeThings = word ~ repN(2, "," ~> word)
Ok, that words. We can write the whole example now, for three things, like this:
def listOfThings = "3" ~ word ~ ":" ~ threeThings
def word = "\\w+".r
def threeThings = word ~ repN(2, "," ~> word)
That kind of works, except that I'm fixing "N" in 3. I want to let the user specify how many. And that's where >>
, also known as into
(and, yes, it is flatMap
for Parser
), comes into. First, let's change threeThings
:
def things(n: Int) = n match {
case 1 => word ^^ (List(_))
case x if x > 1 => word ~ repN(x - 1, "," ~> word) ^^ { case w ~ l => w :: l }
case x => err("Invalid repetitions: "+x)
}
This is slightly more complicated than you might have expected, because I'm forcing it to return Parser[List[String]]
. But how do I pass a parameter to things? I mean, this won't work:
def listOfThings = number ~ word ~ ":" ~ things(/* what do I put here?*/)
But we can rewrite that like this:
def listOfThings = (number ~ word <~ ":") >> {
case n ~ what => things(n.toInt)
}
That is almost good enough, except that I now lost n
and what
: it only returns "List(banana, apple, orange)", not how many there ought to be, and what they are. I can do that like this:
def listOfThings = (number ~ word <~ ":") >> {
case n ~ what => things(n.toInt) ^^ { list => new ~(n.toInt, new ~(what, list)) }
}
def number = "\\d+".r
def word = "\\w+".r
def things(n: Int) = n match {
case 1 => word ^^ (List(_))
case x if x > 1 => word ~ repN(x - 1, "," ~> word) ^^ { case w ~ l => w :: l }
case x => err("Invalid repetitions: "+x)
}
Just a final comment. You might have wondered asked yourself "what do you mean flatMap
? Isn't that a monad/for-comprehension thingy?" Why, yes, and yes! :-) Here's another way of writing listOfThings
:
def listOfThings = for {
nOfWhat <- number ~ word <~ ":"
n ~ what = nOfWhat
list <- things(n.toInt)
} yield new ~(n.toInt, new ~(what, list))
I'm not doing n ~ what <- number ~ word <~ ":"
because that uses filter
or withFilter
in Scala, which is not implemented by Parsers
. But here's even another way of writing it, that doesn't have the exact same semantics, but produce the same results:
def listOfThings = for {
n <- number
what <- word
_ <- ":" : Parser[String]
list <- things(n.toInt)
} yield new ~(n.toInt, new ~(what, list))
This might even give one to think that maybe the claim that "monads are everywhere" might have something to it. :-)
The method >>
takes a function that is given the result of the parser and uses it to contruct a new parser. As stated, this can be used to parameterize a parser on the result of a previous parser.
Example
The following parser parses a line with n + 1
integer values. The first value n
states the number of values to follow. This first integer is parsed and then the result of this parse is used to construct a parser that parses n
further integers.
Parser definition
The following line assumes, that you can parse an integer with parseInt: Parser[Int]
. It first parses an integer value n
and then uses >>
to parse n
additional integers which form the result of the parser. So the initial n
is not returned by the parser (though it's the size of the returned list).
def intLine: Parser[Seq[Int]] = parseInt >> (n => repN(n,parseInt))
Valid inputs
1 42
3 1 2 3
0
Invalid inputs
0 1
1
3 42 42