I'm writing a DSL using Scala's parser combinators. I have recently changed my base class from StandardTokenParsers to JavaTokenParsers to take advantage of the regex features I think I need for one last piece of the puzzle. (see Parsing a delimited multiline string using scala StandardTokenParser)
What I am trying to do is to extract a block of text delimited by some characters ({{
and }}
in this example). This block of text can span multiple lines. What I have so far is:
def docBlockRE = regex("""(?s)(?!}}).*""".r)
def docBlock: Parser[DocString] =
"{{" ~> docBlockRE <~ "}}" ^^ { case str => new DocString(str) }}
where DocString
is a case class in my DSL. However, this doesn't work. It fails if I feed it the following:
{{
abc
}}
{{
abc
}}
I'm not sure why this fails. If I put a Deubg wrapper around have a debug wrapper around the parser (http://jim-mcbeath.blogspot.com/2011/07/debugging-scala-parser-combinators.html) I get the following:
docBlock.apply for token
at position 10.2 offset 165 returns [19.1] failure: `}}' expected but end of source found
If I try a single delimited block with multiple lines:
{{
abc
def
}}
then it also fails to parse with:
docBlock.apply for token
at position 10.2 offset 165 returns [16.1] failure: `}}' expected but end of source found
If I remove the DOTALL directive (?s)
then I can parse multiple single-line blocks (which doesn't really help me much).
Is there any way of combining multi-line regex with negative lookahead?
One other issue I have with this approach is that, no matter what I do, the closing delimiter must be on a separate line from the text. Otherwise I get the same kind of error message I see above. It is almost like the negative lookahead isn't really working as I expect it to.