I really like parser combinators but I'm not happy with the solution I've come up with to extract data when I don't care about the text before the relevant text.
Consider this small parser to get monetary amounts:
import scala.util.parsing.combinator._
case class Amount(number: Double, currency: String)
object MyParser extends JavaTokenParsers {
def number = floatingPointNumber ^^ (_.toDouble)
def currency = """\w+""".r ^? ({
case "USD" => "USD"
case "EUR" => "EUR"
}, "Unknown currency code: " + _)
def amount = (number ~ currency) ^^ {
case num ~ curr => Amount(num, curr)
} | currency ~ number ^^ {
case curr ~ num => Amount(num, curr)
}
def junk = """\S+""".r
def amountNested: Parser[Any] = amount | junk ~> amountNested
}
As you can see, I can get Amount
s back easily if I give the parser a string that begins with valid data:
scala> MyParser.parse(MyParser.amount, "101.41 EUR")
res7: MyParser.ParseResult[Amount] = [1.11] parsed: Amount(101.41,EUR)
scala> MyParser.parse(MyParser.amount, "EUR 102.13")
res8: MyParser.ParseResult[Amount] = [1.11] parsed: Amount(102.13,EUR)
However, it fails when there is non-matching text before it:
scala> MyParser.parse(MyParser.amount, "I have 101.41 EUR")
res9: MyParser.ParseResult[Amount] =
[1.2] failure: Unknown currency code: I
I have 101.41 EUR
^
My solution is the amountNested
parser, in which it recursively tries to find an Amount
. This works but it gives a ParseResult[Any]
:
scala> MyParser.parse(MyParser.amountNested, "I have 101.41 EUR")
res10: MyParser.ParseResult[Any] = [1.18] parsed: Amount(101.41,EUR)
This loss of type information (which can be 'retrieved' using pattern matching, of course) seems unfortunately because any success will contain an Amount
.
Is there a way to keep searching my input ("I have 101.41 EUR"
) until I have a match or not but without having a Parser[Any]
?
Looking at the ScalaDocs it seems like the *
method on Parser
might help but all I get are failures or infinite loops when I try things like:
def amount2 = ("""\S+""".r *) ~> amount