ANTLR4- dynamically inject token

2019-06-07 05:37发布

问题:

So I'm writing a python parser and I need to dynamically generate INDENT and DEDENT tokens (because python doesn't use explicit delimiters) according to the python grammar specification.

Basically I have a stack of integers representing indentation levels. In an embedded Java action in the INDENT token, I check if the current level of indentation is higher than the level on top of the stack; if it is, I push it on; if not, I call skip().

The problem is, if the current indentation level matches a level multiple levels down in the stack, I have to generate multiple DEDENT tokens, and I can't figure out how to do that.

My current code: (note that within_indent_block and current_indent_level are managed elsewhere)

fragment DENT: {within_indent_block}? (SPACE|TAB)+;

INDENT: {within_indent_block}? DENT
        {if(current_indent_level > whitespace_stack.peek().intValue()){
                 whitespace_stack.push(new Integer(current_indent_level));
                 within_indent_block = false;
         }else{
                 skip();
         }
         }
         ;    

DEDENT: {within_indent_block}? DENT
        {if(current_indent_level < whitespace_stack.peek().intValue()){
            while(current_indent_level < whitespace_stack.peek().intValue()){
                      whitespace_stack.pop();
                      <<injectDedentToken()>>; //how do I do this
            }
         }else{
               skip();
         }
         }
         ;

How do I do this and / or is there a better way?

回答1:

There are a few problems with the code you have posted.

  1. The INDENT and DEDENT rules are semantically identical (considering predicates and rule references, but ignoring actions). Since INDENT appears first, this means you can never have a token produced by the DEDENT rule is this grammar.
  2. The {within_indent_block}? predicate appears before you reference DENT as well as inside the DENT fragment rule itself. This duplication serves no purpose but will slow down your lexer.

The actual handling of post-matching actions is best placed in an override of Lexer.nextToken(). For example, you could start with something like the following.

private final Deque<Token> pendingTokens = new ArrayDeque<>();

@Override
public Token nextToken() {
    while (pendingTokens.isEmpty()) {
        Token token = super.nextToken();
        switch (token.getType()) {
        case INDENT:
            // handle indent here. to skip this token, simply don't add
            // anything to the pendingTokens queue and super.nextToken()
            // will be called again.
            break;

        case DEDENT:
            // handle indent here. to skip this token, simply don't add
            // anything to the pendingTokens queue and super.nextToken()
            // will be called again.
            break;

        default:
            pendingTokens.add(token);
            break;
        }
    }

    return pendingTokens.poll();
}