I have an antlr4 grammar designed for an a domain specific language that is embedded into a text template.
There are two modes:
- Text (whitespace should be preserved)
- Code (whitespace should be ignored)
Sample grammar part:
template
: '{' templateBody '}'
;
templateBody
: templateChunk*
;
templateChunk
: code # codeChunk // dsl code, ignore whitespace
| text # textChunk // any text, preserve whitespace
;
The rule for code
may contain a nested reference to the template
rule. So the parser must support nesting whitespace/non-whitespace sections.
Maybe lexer modes can help - with some drawbacks:
- the code sections must be parsed in another compiler pass
- I doubt that nested sections could be mapped correctly
Yet the most promising approach seems to be the manipulation of hidden channels.
My question: Is there a best practice to fill these requirements? Is there an example grammar, that has already solved similar problems?
Appendix:
The rest of the grammar could look as following:
code
: '@' function
;
function
: Identifier '(' argument ')'
;
argument
: function
| template
;
text
: Whitespace+
| Identifier
| .+
;
Identifier
: LETTER (LETTER|DIGIT)*
;
Whitespace
: [ \t\n\r] -> channel(HIDDEN)
;
fragment LETTER
: [a-zA-Z]
;
fragment DIGIT
: [0-9]
;
In this example code
has a dummy implementation pointing out that it can contain nested code/template sections. Actually code
should support
- multiple arguments
- primitive type Arguments (ints, strings, ...)
- maps and lists
- function evaluation
- ...
This is how I solved the problem at the end:
The idea is to enable/disable whitespace in a parser rule:
So we will have to define
enableWs
anddisableWs
in our parser base class:Now what is this
MultiChannelTokenStream
?CommonTokenStream
which is a token stream reading only from one channel.MultiChannelTokenStream
is a token stream reading from the enabled channels. For implementation I took the source code of CommonTokenStream and replaced each reference to thechannel
bychannels
(equality comparison gets contains comparison)An example implementation with the grammar above could be found at antlr4multichannel