I am trying to parse a data file in ANTLR - it has optional whitespace exemplified by
3 6
97 12
15 18
The following shows where the line starts and ends are. There is a newline at the end and there are no tabs.
^ 3 6$
^ 97 12$
^ 15 18$
^
My grammar is:
lines : line+;
line : ws1 {System.out.println("WSOPT :"+$ws1.text+":");}
num1 {System.out.println("NUM1 "+$num1.text);}
ws2 {System.out.println("WS :"+$ws2.text+":");}
num2 {System.out.println("NUM2 "+$num2.text);}
NEWLINE
;
num1 : INT ;
num2 : INT ;
ws1 : WSOPT;
ws2 : WS;
INT : '0'..'9'+;
NEWLINE : '\r'? '\n';
//WS : (' '|'\t' )+ ;
WS : (' ')+ ;
WSOPT : (' ')* ;
which gives
line 1:0 mismatched input ' ' expecting WSOPT
WSOPT :null:
NUM1 3
WS : :
NUM2 6
line 2:0 mismatched input ' ' expecting WSOPT
WSOPT :null:
NUM1 97
WS : :
NUM2 12
BUILD SUCCESSFUL (total time: 1 second)
(i.e. the leading WS has not been recognised and the last line has been missed).
I would like to parse lines which start without whitespace, such as:
^12 34$
^ 23 97$
but I then get errors such as:
line 1:0 required (...)+ loop did not match anything at input ' '
I'd appreciate general explanations of parsing WS in ANTLR.
EDIT @jitter has a useful answer - {ignore=WS}
does not appear in the "Definitive ANTLR reference" book that I am working from so it is clearly a tricky area.
HELP still needed I have modified this to:
lines : line line line;
line
options { ignore=WS; }
:
ws1 {System.out.println("WSOPT :"+$ws1.text+":");}
num1 {System.out.println("NUM1 "+$num1.text);}
ws2 {System.out.println("WS :"+$ws2.text+":");}
num2 {System.out.println("NUM2 "+$num2.text);}
NEWLINE
;
but get the error:
illegal option ignore
EDIT apparently this has been removed from V3: http://www.antlr.org/pipermail/antlr-interest/2007-February/019423.html
I have managed to get this working using lexer constructs such as:
but not in the NEWLINE. Then in the parser constructs such as:
The key was to strip all WS in the lexer except the NEWLINE.
Check Lexical Analysis with ANTLR and then search the part which starts with this heading
Ignoring whitespace in the lexer
You need to use the
{ ignore=WS; }
rule