I have a working parser, but I've just realised I do not cater for comments. In the DSL I am parsing, comments start with a ;
character. If a ;
is encountered, the rest of the line is ignored (not all of it however, unless the first character is ;
).
I am extending RegexParsers
for my parser and ignoring whitespace (the default way), so I am losing the new line characters anyway. I don't wish to modify each and every parser I have to cater for the possibility of comments either, because statements can span across multiple lines (thus each part of each statement may end with a comment). Is there any clean way to acheive this?
One thing that may influence your choice is whether comments can be found within your valid parsers. For instance let's say you have something like:
val p = "(" ~> "[a-z]*".r <~ ")"
which would parse something like ( abc )
but because of comments you could actually encounter something like:
( ; comment goes here
abc
)
Then I would recommend using a TokenParser or one of its subclass. It's more work because you have to provide a lexical parser that will do a first pass to discard the comments. But it is also more flexible if you have nested comments or if the ;
can be escaped or if the ;
can be inside a string literal like:
abc = "; don't ignore this" ; ignore this
On the other hand, you could also try to override the value of whitespace to be something like
override protected val whiteSpace = """(\s|;.*)+""".r
Or something along those lines.
For instance using the example from the RegexParsers scaladoc:
import scala.util.parsing.combinator.RegexParsers
object so1 {
Calculator("""(1 + ; foo
(1 + 2))
; bar""")
}
object Calculator extends RegexParsers {
override protected val whiteSpace = """(\s|;.*)+""".r
def number: Parser[Double] = """\d+(\.\d*)?""".r ^^ { _.toDouble }
def factor: Parser[Double] = number | "(" ~> expr <~ ")"
def term: Parser[Double] = factor ~ rep("*" ~ factor | "/" ~ factor) ^^ {
case number ~ list => (number /: list) {
case (x, "*" ~ y) => x * y
case (x, "/" ~ y) => x / y
}
}
def expr: Parser[Double] = term ~ rep("+" ~ log(term)("Plus term") | "-" ~ log(term)("Minus term")) ^^ {
case number ~ list => list.foldLeft(number) { // same as before, using alternate name for /:
case (x, "+" ~ y) => x + y
case (x, "-" ~ y) => x - y
}
}
def apply(input: String): Double = parseAll(expr, input) match {
case Success(result, _) => result
case failure: NoSuccess => scala.sys.error(failure.msg)
}
}
This prints:
Plus term --> [2.9] parsed: 2.0
Plus term --> [2.10] parsed: 3.0
res0: Double = 4.0
Just filter out all the comments with a regex before you pass the code into your parser.
def removeComments(input: String): String = {
"""(?ms)\".*?\"|;.*?$|.+?""".r.findAllIn(input).map(str => if(str.startsWith(";")) "" else str).mkString
}
val code =
"""abc "def; ghij"
abc ;this is a comment
def"""
println(removeComments(code))