I've got a working Scala parser but the solution is not as clean as I would like. The problem is that some of the productions must consider whitespace as part of the token but the "higher-level" productions should be able to ignore/skip the whitespace.
If I use the typical scala parser pattern of extending the lower level parsers then the skipWhitespace settings are inherited and things get messy very quickly.
I think I would be better off not using the extends approach but rather have an instance of the low level parser available in the higher level parsers' class -- but I'm not sure how to make that work, such that each instance would see only one stream of input characters.
Here is part of the lowest-level parser -
class VulgarFractionParser extends RegexParsers {
override type Elem = Char
override val whiteSpace = "".r
Then I extend that like
class NumberParser extends VulgarFractionParser with Positional {
But at this point the NumberParser must explicitly handle whitespace just like the FractionParser. For the NumberParser it is still pretty manageable - but at the next level up I really want to be able to just define productions that do use whitespace as a separator just like a normal regexParser would do.
An example would be something like:
IBM 33.33/ 1200.00
or
IBM 33.33/33.50 1200.00
The 2nd value sometimes has two parts separated by a "/" and sometimes only has a single part with nothing after the slash (or even not containing a slash at all).
def bidOrAskPrice = ("$"?) ~> (bidOrAskPrice1 | bidOrAskPrice2 | bidOrAskPrice3)
def bidOrAskPrice1 = number ~ ("/".r) ~ number ~ (SPACES) ^^ {
case a ~ slash ~ b ~ sp1 => BidOrAsk(a,Some(b))
}
def bidOrAskPrice2 = (number ~ "/" ~ (SPACES)) ^^ { case a ~ slash ~ sp => BidOrAsk(a,None) }
def bidOrAskPrice3 = (number ~ (SPACES?)) ^^ { case a ~ sp => BidOrAsk(a , None)}
One solution is to override the handleWhiteSpace function and activate skipping whitespace with a var value in your extended class.
You can see the code of RegexParsers here : https://github.com/scala/scala/blob/v2.9.2/src/library/scala/util/parsing/combinator/RegexParsers.scala
Doesn't it make more sense to turn the first parser into a token parser (a lexer, really), and make the second parser read that instead of plain
Char
?