I am working on a grammar that is basically an island grammar.
Let's say the "island" is everything between braces, the "sea" is everything that is not. Like this:
{ (island content) }
Then this simple grammar works:
IslandStart
:
'{' -> pushMode(Island)
;
Fluff
:
~[\{\}]+
;
....
But I'm having trouble to come up with a similar solution to a case where I want the complex (multi-character) opening for my "island" block, like this:
{# (island content) }
In this case I don't know how to make a rule for "Fluff" (everything but my opening sequence).
IslandStart
:
'{#' -> pushMode(Island)
;
Fluff
:
~[\{\}]+ /* Should now include opening braces as well
if they are not immaediately followed by # sign */
;
How do I make it work?
EDIT: GRosenberg came up with a solution but I get a lot of tokens (one per character) with it. This is an example to demonstrate this behaviour:
My lexer grammar:
lexer grammar Demolex;
IslandStart
:
'{$' -> pushMode(Island)
;
Fluff
:
'{' ~'$' .* // any 2+ char seq that starts with '{', but not '{#'
| '{' '$$' .* // starts with hypothetical not IslandStart marker
| '{' // just the 1 char
| .*? ~'{' // minimum sequence that ends before an '{'
;
mode Island;
IslandEnd
:
'}' -> popMode
;
Simplest parser grammar:
grammar Demo;
options { tokenVocab = Demolex; }
template
:
Fluff+
;
This generates a tree with a lot of tokens from the input "somanytokens" when I debug it in antlr4 plugin for Eclipse:
It's not likely that it's a plugin problem. I can easily come up with a token definition that will a result in a big fat token in the tree.
Actually, even the simplest form of grammar gives this result:
grammar Demo2;
template4
:
Fluff+
;
Fluff
:
.*? ~'{' // minimum sequence that ends before an '{'
;