Is D's grammar really context-free?

I've posted this on the D newsgroup some months ago, but for some reason, the answer never really convinced me, so I thought I'd ask it here.

The grammar of D is apparently context-free.

The grammar of C++, however, isn't (even without macros). (Please read this carefully!)

Now granted, I know nothing (officially) about compilers, lexers, and parsers. All I know is from what I've learned on the web.
And here is what (I believe) I have understood regarding context, in not-so-technical lingo:

The grammar of a language is context-free if and only if you can always understand the meaning (though not necessarily the exact behavior) of a given piece of its code without needing to "look" anywhere else.

Or, in even less rigor:

The grammar cannot be context-free if I need I can't tell the type of an expression just by looking at it.

So, for example, C++ fails the context-free test because the meaning of confusing<sizeof(x)>::q < 3 > (2) depends on the value of q.

So far, so good.

Now my question is: Can the same thing be said of D?

In D, hashtables can be created through a Value[Key] declaration, for example

int[string] peoplesAges;   // Maps names to ages

Static arrays can be defined in a similar syntax:

int[3] ages;   // Array of 3 elements

And templates can be used to make them confusing:

template Test1(T...)
{
    alias int[T[0]] Test;
}

template Test2(U...)
{
    alias int[U] Test2;  // LGTM
}

Test1!(5) foo;
Test1!(int) bar;
Test2!(int) baz;  // Guess what? It's invalid code.

This means that I cannot tell the meaning of T[0] or U just by looking at it (i.e. it could be a number, it could be a data type, or it could be a tuple of God-knows-what). I can't even tell if the expression is grammatically valid (since int[U] certainly isn't -- you can't have a hashtable with tuples as keys or values).

Any parsing tree that I attempt to make for Test would fail to make any sense (since it would need to know whether the node contains a data type versus a literal or an identifier) unless it delays the result until the value of T is known (making it context-dependent).

Given this, is D actually context-free, or am I misunderstanding the concept?

Why/why not?

Update:

I just thought I'd comment: It's really interesting to see the answers, since:

Some answers claim that C++ and D can't be context-free
Some answers claim that C++ and D are both context-free
Some answers support the claim that C++ is context-sensitive while D isn't
No one has yet claimed that C++ is context-free while D is context-sensitive :-)

I can't tell if I'm learning or getting more confused, but either way, I'm kind of glad I asked this... thanks for taking the time to answer, everyone!

7条回答

干净又极端

2楼-- · 2019-03-07 22:17

These answers are making my head hurt.

First of all, the complications with low level languages and figuring out whether they are context-free or not, is that the language you write in is often processed in many steps.

In C++ (order may be off, but that shouldn't invalidate my point):

it has to process macros and other preprocessor stuffs
it has to interpret templates
it finally interprets your code.

Because the first step can change the context of the second step and the second step can change the context of the third step, the language YOU write in (including all of these steps) is context sensitive.

The reason people will try and defend a language (stating it is context-free) is, because the only exceptions that adds context are the traceable preprocessor statements and template calls. You only have to follow two restricted exceptions to the rules to pretend the language is context-free.

Most languages are context-sensitive overall, but most languages only have these minor exceptions to being context-free.

0人赞添加讨论(0) 举报

爷的心禁止访问

3楼-- · 2019-03-07 22:19

To answer the question of if a programming language is context free you must first decide where to draw the line between syntax and semantics. As an extreme example, it is illegal in C for a program to use the value of some kinds of integers after they have been allowed to overflow. Clearly this can't be checked at compile time, let alone parse time:

void Fn() {
  int i = INT_MAX;
  FnThatMightNotReturn();  // halting problem?
  i++;
  if(Test(i)) printf("Weeee!\n");
}

As a less extreme example that others have pointed out, deceleration before use rules can't be enforced in a context free syntax so if you wish to keep your syntax pass context free, then that must be deferred to the next pass.

As a practical definition, I would start with the question of: Can you correctly and unambiguously determine the parse tree of all correct programs using a context free grammar and, for all incorrect programs (that the language requires be rejected), either reject them as syntactically invalid or produce a parse tree that the later passes can identify as invalid and reject?

Given that the most correct spec for the D syntax is a parser (IIRC an LL parser) I strongly suspect that it is in fact context free by the definition I suggested.

Note: the above says nothing about what grammar the language documentation or a given parser uses, only if a context free grammar exists. Also, the only full documentation on the D language is the source code of the compiler DMD.

0人赞添加讨论(0) 举报

叛逆

4楼-- · 2019-03-07 22:24

There is a construct in D's lexer:

string ::= q" Delim1 Chars newline Delim2 "

where Delim1 and Delim2 are matching identifiers, and Chars does not contain newline Delim2.

This construct is context sensitive, therefore D's lexer grammar is context sensitive.

It's been a few years since I've worked with D's grammar much, so I can't remember all the trouble spots off the top of my head, or even if any of them make D's parser grammar context sensitive, but I believe they do not. From recall, I would say D's grammar is context free, not LL(k) for any k, and it has an obnoxious amount of ambiguity.

0人赞添加讨论(0) 举报

Luminary・发光体

5楼-- · 2019-03-07 22:25

The grammar cannot be context-free if I need I can't tell the type of an expression just by looking at it.

No, that's flat out wrong. The grammar cannot be context-free if you can't tell if it is an expression just by looking at it and the parser's current state (am I in a function, in a namespace, etc).

The type of an expression, however, is a semantic meaning, not syntactic, and the parser and the grammar do not give a penny about types or semantic validity or whether or not you can have tuples as values or keys in hashmaps, or if you defined that identifier before using it.

The grammar doesn't care what it means, or if that makes sense. It only cares about what it is.

0人赞添加讨论(0) 举报

一夜七次

6楼-- · 2019-03-07 22:27

The property of being context free is a very formal concept; you can find a definition here. Note that it applies to grammars: a language is said to be context free if there is at least one context free grammar that recognizes it. Note that there may be other grammars, possibly non context free, that recognize the same language.

Basically what it means is that the definition of a language element cannot change according to which elements surround it. By language elements I mean concepts like expressions and identifiers and not specific instances of these concepts inside programs, like a + b or count.

Let's try and build a concrete example. Consider this simple COBOL statement:

   01 my-field PICTURE 9.9 VALUE 9.9.

Here I'm defining a field, i.e. a variable, which is dimensioned to hold one integral digit, the decimal point, and one decimal digit, with initial value 9.9 . A very incomplete grammar for this could be:

field-declaration ::= level-number identifier 'PICTURE' expression 'VALUE' expression '.'
expression ::= digit+ ( '.' digit+ )

Unfortunately the valid expressions that can follow PICTURE are not the same valid expressions that can follow VALUE. I could rewrite the second production in my grammar as follows:

'PICTURE' expression ::= digit+ ( '.' digit+ ) | 'A'+ | 'X'+
'VALUE' expression ::= digit+ ( '.' digit+ )

This would make my grammar context-sensitive, because expression would be a different thing according to whether it was found after 'PICTURE' or after 'VALUE'. However, as it has been pointed out, this doesn't say anything about the underlying language. A better alternative would be:

field-declaration ::= level-number identifier 'PICTURE' format 'VALUE' expression '.'
format ::= digit+ ( '.' digit+ ) | 'A'+ | 'X'+
expression ::= digit+ ( '.' digit+ )

which is context-free.

As you can see this is very different from your understanding. Consider:

a = b + c;

There is very little you can say about this statement without looking up the declarations of a,b and c, in any of the languages for which this is a valid statement, however this by itself doesn't imply that any of those languages is not context free. Probably what is confusing you is the fact that context freedom is different from ambiguity. This a simplified version of your C++ example:

a < b > (c)

This is ambiguous in that by looking at it alone you cannot tell whether this is a function template call or a boolean expression. The previous example on the other hand is not ambiguous; From the point of view of grammars it can only be interpreted as:

identifier assignment identifier binary-operator identifier semi-colon

In some cases you can resolve ambiguities by introducing context sensitivity at the grammar level. I don't think this is the case with the ambiguous example above: in this case you cannot eliminate the ambiguity without knowing whether a is a template or not. Note that when such information is not available, for instance when it depends on a specific template specialization, the language provides ways to resolve ambiguities: that is why you sometimes have to use typename to refer to certain types within templates or to use template when you call member function templates.

0人赞添加讨论(0) 举报

爷、活的狠高调

7楼-- · 2019-03-07 22:31

Being context free is first a property of generative grammars. It means that what a non-terminal can generate will not depend on the context in which the non-terminal appears (in non context-free generative grammar, the very notion of "string generated by a given non-terminal" is in general difficult to define). This doesn't prevent the same string of symbols to be generated by two non-terminals (so for the same strings of symbols to appear in two different contexts with a different meaning) and has nothing to do with type checking.

It is common to extend the context-free definition from grammars to language by stating that a language is context-free if there is at least one context free grammar describing it.

In practice, no programming language is context-free because things like "a variable must be declared before it is used" can't be checked by a context-free grammar (they can be checked by some other kinds of grammars). This isn't bad, in practice the rules to be checked are divided in two: those you want to check with the grammar and those you check in a semantic pass (and this division also allows for better error reporting and recovery, so you sometimes want to accept more in the grammar than what would be possible in order to give your users better diagnostics).

What people means by stating that C++ isn't context-free is that doing this division isn't possible in a convenient way (with convenient including as criteria "follows nearly the official language description" and "my parser generator tool support that kind of division"; allowing the grammar to be ambiguous and the ambiguity to be resolved by the semantic check is an relatively easy way to do the cut for C++ and follow quite will the C++ standard, but it is inconvenient when you are relying on tools which don't allow ambiguous grammars, when you have such tools, it is convenient).

I don't know enough about D to know if there is or not a convenient cut of the language rules in a context-free grammar with semantic checks, but what you show is far from proving the case there isn't.

0人赞添加讨论(0) 举报

1 2 下一页