How Get error messages of antlr parsing?

2019-01-26 17:56发布

问题:

I wrote a grammar with antlr 4.4 like this :

grammar CSV;

file
  :  row+ EOF
  ;

row
  :  value (Comma value)* (LineBreak | EOF)
  ;

value
  :  SimpleValueA
  |  QuotedValue
  ;

Comma
  :  ','
  ;

LineBreak
  :  '\r'? '\n'
  |  '\r'
  ;

SimpleValue
  :  ~(',' | '\r' | '\n' | '"')+
  ;

QuotedValue
  :  '"' ('""' | ~'"')* '"'
  ;

then I use antlr 4.4 for generating parser & lexer, this process is successful

after generate classes I wrote some java code for using grammar

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;

public class Main {

    public static void main(String[] args)
    {
        String source =  "\"a\",\"b\",\"c";
        CSVLexer lex = new CSVLexer(new ANTLRInputStream(source));
        CommonTokenStream tokens = new CommonTokenStream(lex);
        tokens.fill();
        CSVParser parser = new CSVParser(tokens);
        CSVParser.FileContext file = parser.file();
    }
}

all of above code is a parser for CSV strings for example : ""a","b",c"

Window Output :

line 1:8 token recognition error at: '"c'
line 1:10 missing {SimpleValue, QuotedValue} at '<EOF>'

I want to know How I can get this errors from a method (getErrors() or ...) in code-behind not as result of output window

Can anyone help me ?

回答1:

Using ANTLR for CSV parsing is a nuclear option IMHO, but since you're at it...

  • Implement the interface ANTLRErrorListener. You may extend BaseErrorListener for that. Collect the errors and append them to a list.
  • Call parser.removeErrorListeners() to remove the default listeners
  • Call parser.addErrorListener(yourListenerInstance) to add your own listener
  • Parse your input

Now, for the lexer, you may either do the same thing removeErrorListeners/addErrorListener, or add the following rule at the end:

UNKNOWN_CHAR : . ;

With this rule, the lexer will never fail (it will generate UNKNOWN_CHAR tokens when it can't do anything else) and all errors will be generated by the parser (because it won't know what to do with these UNKNOWN_CHAR tokens). I recommend this approach.