Suppose I'm writing a rudimentary SQL parser in Scala. I have the following:
class Arith extends RegexParsers {
def selectstatement: Parser[Any] = selectclause ~ fromclause
def selectclause: Parser[Any] = "(?i)SELECT".r ~ tokens
def fromclause: Parser[Any] = "(?i)FROM".r ~ tokens
def tokens: Parser[Any] = rep(token) //how to make this non-greedy?
def token: Parser[Any] = "(\\s*)\\w+(\\s*)".r
}
When trying to match selectstatement against SELECT foo FROM bar
, how do I prevent the selectclause from gobbling up the entire phrase due to the rep(token)
in ~ tokens
?
In other words, how do I specify non-greedy matching in Scala?
To clarify, I'm fully aware that I can use standard non-greedy syntax (*?) or (+?) within the String pattern itself, but I wondered if there's a way to specify it at a higher level inside def tokens. For example, if I had defined token like this:
def token: Parser[Any] = stringliteral | numericliteral | columnname
Then how can I specify non-greedy matching for the rep(token) inside def tokens?
Not easily, because a successful match is not retried. Consider, for example:
The first match was successful, in parser inside parenthesis, so it proceeded to the next one. That one failed, so
p
failed. Ifp
was part of alternative matches, the alternative would be tried, so the trick is to produce something that can handle that sort of thing.Let's say we have this:
You can then use it like this:
By the way,I always found that looking at how other things are defined is helpful in trying to come up with stuff like
nonGreedy
above. In particular, look at howrep1
is defined, and how it was changed to avoid re-evaluating its repetition parameter -- the same thing would probably be useful onnonGreedy
.Here's a full solution, with a little change to avoid consuming the "terminal".