I have been trying to build a SQL parser with the scala-parser-combinator library, which I've simplified greatly into the code below.
class Expression
case class FalseExpr() extends Expression
case class TrueExpr() extends Expression
case class AndExpression(expr1: Expression, expr2: Expression) extends Expression
object SimpleSqlParser {
def parse(sql: String): Try[Expression] = new SimpleSqlParser().parse(sql)
}
class SimpleSqlParser extends RegexParsers {
def parse(sql: String): Try[_ <: Expression] = parseAll(expression, sql) match {
case Success(matched,_) => scala.util.Success(matched)
case Failure(msg,remaining) => scala.util.Failure(new Exception("Parser failed: "+msg + "remaining: "+ remaining.source.toString.drop(remaining.offset)))
case Error(msg,_) => scala.util.Failure(new Exception(msg))
}
private def expression: Parser[_ <: Expression] =
andExpr | falseExpr | trueExpr
private def falseExpr: Parser[FalseExpr] =
"false" ^^ (_ => FalseExpr())
private def trueExpr: Parser[TrueExpr] = "true" ^^ (_ => TrueExpr())
private def andExpr: Parser[Expression] =
expression ~ "and" ~ expression ^^ { case e1 ~ and ~ e2 => AndExpression(e1,e2)}
}
Without the 'and' parsing, it works fine. But I want to be able to parse things like 'true AND (false OR true)', for example. When I add the 'and' part to the definition of an expression, I get a StackOverflowError, the stack is alternating between the definitions of 'and' and 'expression'.
I understand why this is happening - the definition of expression begins with and, and vice-versa. But this seems like the most natural way to model this problem. In reality an expression could also be LIKE, EQUALS etc. Is there another way to model this kind of thing in general in order to get around the problem of recursive definitions.
scala.util.parsing.combinator.RegexParsers
cannot handle left-recursive grammars. Your grammar can be summarized by the following production rules:expression
is indirectly left-recursive viaandExpr
.To avoid the infinite recursion, you need to reformulate the grammar so that it is not left-recursive anymore. One frequently-used way is to use repetition combinators, such as
chainl1
:Live on Scastie
The new
expression
matches one or morefalseExpr
/trueExpr
, separated by"and"
, and combines the matched elements withAndExpression
in a left-associative way. Conceptually, it corresponds to the following production rule:If your grammar contains many tangled left-recursive production rules, you might want to consider other parser combinator libraries, such as GLL combinators, that directly support left recursion.