I have a Hello.g4
grammar file with a grammar definition:
definition : wordsWithPunctuation ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
Now, if I am trying to build a parse tree from the following input:
a b c d of at of abc bcd of
a b c d at abc, bcd
a b c d of at of abc, bcd of
it returns errors:
Hello::definition:1:31: extraneous input 'of' expecting {<EOF>, '(', '"', WORD, PUNCTUATION}
though the:
a b c d at: abc bcd!
works correct.
What is wrong with the grammar or input or interpreter?
If I modify the wordsWithPunctuation
rule, by adding (... | 'of' | ',' word | ...)
then it matches the input completely, but it looks suspicious for me - how the word of
is different from the word a
or abc
? Or why the ,
is different from other punctuation
characters (i.e., why does it match the :
or !
, but not ,
?)?
Update1:
I am working with ANTLR4 plugin for Eclipse, so the project build happens with the following output:
ANTLR Tool v4.2.2 (/var/folders/.../antlr-4.2.2-complete.jar)
Hello.g4 -o /Users/.../eclipse_workspace/antlr_test_project/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8
Update2:
the presented above grammar is just a partial from:
grammar Hello;
text : (entry)+ ;
entry : blub 'abrr' '-' ('1')? '.' ('(' NUMBER ')')? sims '-' '(' definitionAndExamples ')' 'Hello' 'all' 'the' 'people' 'of' 'the' 'world';
blub : WORD ;
sims : sim (',' sim)* ;
sim : words ;
definitionAndExamples : definitions (';' examples)? ;
definitions : definition (';' definition )* ;
definition : wordsWithPunctuation ;
examples : example (';' example )* ;
example : '"' wordsWithPunctuation '"' ;
words : (WORD)+ ;
wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ;
NUMBER : [0-9]+ ;
word : WORD ;
WORD : [A-Za-z-]+ ;
punctuation : PUNCTUATION ;
PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
It looks now for me, that the words from the entry
rule somehow breaking the other rules within the entry
rule. But why? Is it a kind an anti-pattern in the grammar?