Consider this very simplified example where an input of the following form should be matched
mykey -> This is the value
My real case is much more complex but this will do for showing what I try to achieve. mykey
is an ID
while on the right side of ->
we have a set of Words
. If I use
grammar Root;
parse
: ID '->' value
;
value
: Word+
;
ID
: ('a'..'z')+
;
Word
: ('a'..'z' | 'A'..'Z' | '0'..'9')+
;
WS
: ' ' -> skip
;
the example won't be parsed because the lexer will give an ID
token for the first is
which is not matched by Word+
. In my real example, the value
-language is vastly different and I'd like to parse it with a different grammar.
I have considered different solutions:
Switching the lexer
mode
but AFAIK, switching the lexer to a different mode can only happen in a lexer rule. This is problematic for this case and my real case as well as there are no unique tokens that start and end thevalue
part. What I would need is something like "tokenizevalue
with different rules" which is, of course, stupid, because lexer and parser act independently and as soon as the parser starts, everything is already tokenizedUsing a different grammar for
value
. When I see this right, the approach of importing a grammar won't work, since it always combines two grammars leading to the same situation of wrong tokenization.Creating a first crude parser, that accepts the whole language but doesn't create the correct tree for
value
. I could then use a visitor and reparsevalue
nodes with a different sub-parser possibly inserting a new, correct subtree for value. This feels a bit clumsy.
If you need a simple real-world application, then you could consider strings in Java. Some of them might be a regex which needs to be parsed with a completely different parser. It is similar to injected languages you can use inside IDEA.
Question: Is there an idiomatic way in ANTRL4 to parse a specific rule with a different grammar? Best case would be if I can specify this on the grammar level so that the resulting AST is a combination of the outer language that contains a sub-tree of the injected language.