Can we define a non context-free grammar with ANTL

2020-02-14 06:35发布

I'm pretty new to ANTLR4 and now I'm trying to undertand which kind of grammars we might define with it.

As far as I got, there're two kind of rules in ANTLR: parser rules (lower case words) and lexer rules (upper-case words). Example:

grammar Test;

init: prog(','prog)*;

prog: A
     | prog
     ;

A: [a-z]+;

Form the grammar production rule standpoint I would say that parser rules are NON-TERMINAL symbols which can be replaced with a sequence of tokens defined by a lexer rules.

So, it's perfectly clear that the grammar is context-free by the definition. The alpahbet of the language generated by the grammar consists of all words making up from lower-case latin letter.

Question: Can we define a non context-free grammar using ANTLR4?

1条回答
放我归山
2楼-- · 2020-02-14 07:16

YES. (cough).

It is my understanding that you can add code to the rules. Arbitrary code can test arbitrary things, so the answer is "yes". In general, I don't think you can do this well with ANTLR, but this is pretty practical for lots of interesting special cases (e.g., accept all digit strings except those that are prime numbers).

NO.

I think if you stick to the the syntax specification allowed by ANTLR, the answer is "no". In fact, there are context-free grammars that you can "specify" with ANTLR that it cannot process correctly, which is true of most parser generators. (For ANTLR, this includes grammars with indirect left recursion, ambiguity, arbitrary lookahead, etc.) We even call most of these parser generators by the names of their "limitations", e.g., LL(1), LALR(k), etc.

Which ones can do full context free?

A few parser generators can handle full-, context free grammars. Earley and CYK parsers come to mind, but they aren't very fast so people tend to avoid using them. GLR parsers can do this (we use this in our tools because it really aids writing grammars for real languages [see my bio] but there are some grammars that make them pretty slow; you can mostly avoid these. Apparently GLL parsing schemes exist and are also full context free; I'd expect them to have performance troubles with some obtuse grammars too, but also to be pretty usable in practice.

The only parser generator I've heard of which can do a variety of context-sensitive grammars is MetaS. I've never used it, but the theory behind it is pretty impressive. The claim is that it can do arbitrary context-sensitive grammars; it will run into extremely high costs for arbitrarily nasty grammars, but that's actually not an objection in practice.

查看更多
登录 后发表回答