I've very simple grammar which tries to match 'é' to token E_CODE.
I've tested it using TestRig tool (with -tokens option), but parser can't correctly match it.
My input file was encoded in UTF-8 without BOM and I've used ANTLR version 4.4.
Could somebody else also check this ? I got this output on my console:
line 1:0 token recognition error at: 'Ă'
grammar Unicode;
stat:EOF;
E_CODE: '\u00E9' | 'é';
Your grammar file is not saved in utf8 format. Utf8 is default format that antlr accept as input grammar file, according with terence Parr book.
I tested the grammar:
as follows:
and the following got printed to my console:
Tested with 4.2 and 4.3 (4.4 isn't in Maven Central yet).
EDIT
Looking at the source I see TestRig takes an optional
-encoding
param. Have you tried setting it?