Unable to parse APL Symbol using ANTLR

2019-02-26 01:49发布

问题:

I am trying to parse APL expressions using ANTLR, It is sort of APL source code parser. It parse normal characters but fails to parse special symbols(like '←')

expression = N←0

Lexer

/* Lexer Tokens. */

NUMBER:    
 (DIGIT)+ ( '.' (DIGIT)+ )?;

ASSIGN:
    '←'
    ;

DIGIT : 
    [0-9]
    ;

Output:

[@0,0:1='99',<NUMBER>,1:0]
**[@1,4:6='â??',<'â??'>,2:0**]
[@2,7:6='<EOF>',<EOF>,2:3]

Can some one help me to parse special characters from APL language.

I am following below steps.

  1. Written Grammar
  2. "antlr4.bat" used to generate parser from grammar.
  3. "grun.bat" is used to generate token

回答1:

  1. "grun.bat" is used to generate token

That just means your terminal cannot display the character properly. There is nothing wrong with the generated parser or lexer not being able to recognise .

Just don't use the bat file, but rather test your lexer and parser by writing a small class yourself using your favourite IDE (which can display the characters properly).

Something like this:

grammar T;

expression
 : ID ARROW NUMBER
 ;

ID     : [a-zA-Z]+;
ARROW  : '←';
NUMBER : [0-9]+;
SPACE  : [ \t\r\n]+ -> skip;

and a main class:

import org.antlr.v4.runtime.*;

public class Main {
  public static void main(String[] args) {
    TLexer lexer = new TLexer(CharStreams.fromString("N ← 0"));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    System.out.println(parser.expression().toStringTree(parser));
  }
}

which will display:

(expression N ← 0)

EDIT

You could also try using the unicode escape for the arrow like this:

grammar T;

expression
 : ID ARROW NUMBER
 ;

ID     : [a-zA-Z]+;
ARROW  : '\u2190';
NUMBER : [0-9]+;
SPACE  : [ \t\r\n]+ -> skip;

and the Java class:

import org.antlr.v4.runtime.*;

public class Main {
  public static void main(String[] args) {
    String source = "N \u2190 0";
    TLexer lexer = new TLexer(CharStreams.fromString(source));
    TParser parser = new TParser(new CommonTokenStream(lexer));
    System.out.println(source + ": " + parser.expression().toStringTree(parser));
  }
}

which will print:

N ← 0: (expression N ← 0)


标签: antlr apl