Access Channels in ANTLR 4 and Parse them separate

2019-02-27 22:11发布

问题:

I have included my comments in to a separate channel in ANTLR 4. in my case it is channel 2.

This is my lexer grammar.

COMMENT: '/*' .*? '*/' -> channel(2) 
       ;

I want to access this channel 2 and do a parse on this channel to accumulate comments. So I included that as in parsing grammar as below

comment
:COMMENT
;

In the program

        string s = " paring string"
        AntlrInputStream input = new AntlrInputStream(s);
        CSSLexer lexer = new CSSLexer(input); 

        CommonTokenStream tokens = new CommonTokenStream(lexer,2);

Then I want to do the parsing on the tokens

var xr = parser.comment().GetRuleContexts<CommentContext>();

because I want to get the information from the CommentContext object such as Start.Column etc.

EDIT:

This is the improved question

To be more specific, I want to get all the tokens in channel 2 and parse them using comment grammar to get all the comments to a list(IReadOnly<CommentContext>) so that I can iterate through each of these and access the information such as, start line, start column, end line end column, and the token text.

CommonTokenStream tokens = new CommonTokenStream(lexer,2);

This is not giving me the tokens in channel 2. And another thing I discovered is until these tokens are passed as arguments to the parser construct XParser parser = new XParser(tokens);

Then only I can access the the tokens by calling GetTokens().In the tokes I can see that there are comments identified as tokens and is in the channel 2. Even though CommentTokenStrem species the channel number as above. it contains all the tokens.

  1. What is the reason of not able to access the tokens until the parser object is created using the tokens?

  2. I want to get a CommentTokenStrem in channel 2 and pass the to the XParser object creation to parse these tokens using my comment grammar. What is the best way of doing this in ANTLR 4 API?

回答1:

CommonTokenStream internally tracks all tokens from any channel. The only thing you won't see when you call getTokens() is lexer rules where a -> skip action was executed (no token is even created for those rules).

You can look at the tokens on channel 2 by using the TokenStream.LT and IntStream.consume methods.

Java example

CommonTokenStream cts = new CommonTokenStream(tokenSource, 2);
List<Token> tokens = new ArrayList<Token>();
while (cts.LA(1) != EOF) {
    tokens.add(cts.LT(1));
    cts.consume();
}

C# example:

CommonTokenStream cts = new CommonTokenStream(tokenSource, 2);
IList<IToken> tokens = new List<IToken>();
while (cts.La(1) != Eof)
{
    tokens.Add(cts.Lt(1));
    cts.Consume();
}


回答2:

How about this:

 var allowedChannels = new[] { 2 }; // add more if you need to
 var tokensImInterestedIn = tokens.GetTokens().Where(token => allowedChannels.Contains(token.Channel) && token.Type != CSSLexer.Eof).ToArray();

 // if you're just interested in one particular channel
 var tokensImInterestedIn = tokens.GetTokens().Where(token => token.Channel == 2) && token.Type != CSSLexer.Eof).ToArray();


回答3:

Alternatively you could put all the other tokens in another channel and use the default channel for your parser.

Of course this would not work if you have two parsers that expect tokens in separate channels.



回答4:

ANTLR 4 C# :

       using Antlr4.Runtime;
       ...

       MyLexer lexer = new MyLexer (inputStream);
       var tokenstream = new CommonTokenStream(lexer, TokenConstants.HiddenChannel);
       IList<IToken> tokens = new List<IToken>();

       while (tokenstream.La(1) != TokenConstants.Eof)
       {                    
                            tokens.Add(tokenstream.Lt(1));
                            tokenstream.Consume();
       }
       foreach (IToken iToken in tokens)
       {
              Console.WriteLine(" Line : {0} Text : {1} ",
                                iToken.Line,
                                iToken.Text                  
                                );
       }