I'm fairly new to Scala and while reading about parser combinators(The Magic Behind Parser Combinators, Domain-Specific Languages in Scala) I came across method definitions like this:
def classPrefix = "class" ~ ID ~ "(" ~ formals ~ ")"
I've been reading throught the API doc of scala.util.parsing.Parsers which defines a method named (tilde) but I still dont't really understand its usage in the example above. In that example (tilde) is a method which is called on java.lang.String which doesn't have that method and causes the compiler to fail. I know that (tilde) is defined as
case class ~ [+a, +b] (_1: a, _2: b)
but how does this help in the example above?
I'd be happy if someone could give me a hint to understand what's going on here. Thank you very much in advance!
Jan
The
~
method on parser combines two parser in one which applies the two original parsers successively and returns the two results. That could be simply (inParser[T]
)If you never combined more than two parsers, that would be ok. However, if you chain three of them,
p1
,p2
,p3
, with return typesT1
,T2
,T3
, thenp1 ~ p2 ~ p3
, which meansp1.~(p2).~(p3)
is of typeParser[((T1, T2), T3)]
. And if you combine five of them as in your example, that would beParser[((((T1, T2), T3), T4), T5)]
. Then when you pattern match on the result, you would have all those parantheses too :This is quite uncomfortable.
Then comes a clever syntactic trick. When a case class has two parameters, it can appears in infix rather than prefix position in a pattern. That is, if you have
case class X(a: A, b: B)
, you can pattern match withcase X(a, b)
, but also withcase a X b
. (That is what is done with a patternx::xs
to match a non empty List,::
is a case class). When you write casea ~ b ~ c
, it meanscase ~(~(a,b), c)
, but is much more pleasant, and more pleasant thancase ((a,b), c)
too, which is tricky to get right.So the
~
method in Parser returns aParser[~[T,U]]
instead of aParser[(T,U)]
, so you can pattern match easily on the result of multiple ~. Beside that,~[T,U]
and(T,U)
are pretty much the same thing, as isomorphic as you can get.The same name is chosen for the combining method in parser and for the result type, because the resulting code is natural to read. One sees immediately how each part in the result processing relates to the items of the grammar rule.
Tilda is chosen because its precedence (it binds tightly) plays nicely with the other operators on parser.
One last point, there are auxiliary operators
~>
and<~
which discard the result of one of the operand, typically the constant parts in the rule which carries no useful data. So one would rather writeand get only the values of ID and formals in the result.
The structure here is a little bit tricky. First, notice that you always define these things inside a subclass of some parser, e.g.
class MyParser extends RegexParsers
. Now, you may note two implicit definitions insideRegexParsers
:What these will do is take any string or regex and convert them into a parser that matches that string or that regex as a token. They're implicit, so they'll be applied any time they're needed (e.g. if you call a method on
Parser[String]
thatString
(orRegex
) does not have).But what is this
Parser
thing? It's an inner class defined insideParsers
, the supertrait forRegexParser
:Looks like it's a function that takes input and maps it to a result. Well, that makes sense! And you can see the documentation for it here.
Now we can just look up the
~
method:So, if we see something like
what happens is, first,
"fish"
does not have the~
method, so it's implicitly converted toParser[String]
which does. The~
method then wants an argument of typeParser[U]
, and so we implicitly convert"swim"
intoParser[String]
(i.e.U
==String
). Now we have something that will match an input"fish"
, and whatever is left in the input should match"swim"
, and if both are the case, thenseaFacts
will succeed in its match.You should checkout Parsers.Parser. Scala sometimes defines method and case class with the same name to aid pattern matching etc, and it's a little confusing if you're reading the Scaladoc.
In particular,
"class" ~ ID
is same as"class".~(ID)
.~
is a method that combines the parser with another parser sequentially.There's an implicit conversion defined in
RegexParsers
that automatically creates a parser from aString
value. So,"class"
automatically becomes an instance ofParser[String]
.RegexParsers
also defines another implicit conversion that automatically creates parser from aRegex
value. So,ID
automatically becomes an instance ofParser[String]
too.By combining two parsers,
"class" ~ ID
returns aParser[String]
that matches the literal "class" and then the regular expressionID
appearing sequentially. There are other methods like|
and|||
. For more info, read Programming in Scala.