Why does a programming language need keywords?

2019-03-09 16:43发布

问题:

For example (in C):

int break = 1;
int for = 2;

Why will the compiler have any problems at all in deducing that break and for are variables here?


So, we need keywords because

  • we want the programs to be readable
  • we do not want to over-complicate the job of already complex compilers of today
  • but most importantly, a language is lot more powerful if some 'key'words are reserved for some special actions. Then, the language can think of being useful at a higher level rather than dying in trying to implement a for loop in an unambiguous way.

回答1:

Then what will the computer do when it comes across a statement like:

while(1) {
  ...
  if (condition)
    break;
}

Should it actually break? Or should it treat it as 1;?

The language would become ambiguous in certain cases, or you'd have to create a very smart parser that can infer subtle syntax, and that's just unnecessary extra work.



回答2:

It's not necessary -- Fortran didn't reserve any words, so things like:

if if .eq. then then if = else else then = if endif

are complete legal. This not only makes the language hard for the compiler to parse, but often almost impossible for a person to read or spot errors. for example, consider classic Fortran (say, up through Fortran 77 -- I haven't used it recently, but at least hope they've fixed a few things like this in more recent standards). A Fortran DO loop looks like this:

DO 10 I = 1,10

Without them being side-by-side, you can probably see how you'd miss how this was different:

DO 10 I = 1.10

Unfortunately, the latter isn't a DO loop at all -- it's a simple assignment of the value 1.10 to a variable named DO 10 I (yes, it also allows spaces in a name). Since Fortran also supports implicit (undeclared) variables, this is (or was) all perfectly legal, and some compilers would even accept it without a warning!



回答3:

They don't. PL/1 famously has no keywords; every "keyword" (BEGIN, DO, ...) can also be used a variable name. But allowing this means you can write really obscure code: IF DO>BEGIN THEN PRINT:=CALL-GOTO; Reserving the "statement keywords" as the language isn't usually a loss if that set of names is modest (as it is in every langauge I've ever seen except PL/1 :-).

APL also famously has no keywords. But it has a set of some 200 amazing iconic symbols in which to write complicated operators. (the "domino" operator [don't ask!] is a square box with a calculator divide sign in the middle) In this case, the langauge designers simply used icons instead of keywords. The consequence is that APL has a reputation of being a "write only" language.

Bottom line: not a requirement, but it tends to make programs a lot more readable if the keywords are reserved identifiers from a small set known to the programmers. (Some langauges has insisted that "keywords" start with a special punctuation character like "." to allow all possible identifiers to be used, but this isn't worth the extra trouble to type or the clutter on the page; its pretty easy to stay away from "identifiers" that match keywords when the keyword set is small).



回答4:

Since it's tagged C, the original C language was such that by default any variable was defined as type int.

It means that foo; would declare a variable of type int.

Let's say you do break;. So how does the compiler know whether you want to declare a variable named break or use the keyword break?



回答5:

several reasons:

  • The keywords may seem unambiguous in your samples. But that is not the only place you would use the variable 'break' or the variable 'for'.

  • writing the parser would be much harder and error prone for little gain.

  • using a keyword as a function or procedure name in a library may have undesired, possibly security relevant, side effects.



回答6:

As others said, this makes compiler parsing your source code easier. But I would like to say a bit more: it can also make your source code more readable; consider this example:

if (if > 0) then then = 10 end if

The second "if" and the second "then" are variables, while others are not. I think this kind of code is not readable. :)



回答7:

If we are speaking of C++ - it already has very complicated grammar. Allowing to use keywords as variable names, for example, will make it even more complicated.



回答8:

The compiler would have problems if you write something like this:

while(*s++);
return(5);

Is that a loop or a call to a function named while? Did you want to return the value 5 from the current function, or did you want to call a function named return?

It often simplifies things if constructs with special meaning simply have special names that can be used to unambiguously refer to them.



回答9:

Because we want to keep what little sanity points we've got:

void myfunction(bool) { .. };

funcp while = &myfunction;
while(true); 


回答10:

I guess it look very weird if not impossible to write the parser. E.g

int break = 1;
while (true) {
   // code to change break
   if (!break) break;   // not very readable code.
}


回答11:

Depending on the language definition a compiler may or may not need keywords. When it does not know what to do it can try to apply precedence rules or just fail.
An example:

void return(int i){printf("%d",i);}
public int foo(int a)
{
  if(a > 2)return (a+1)*2;
  return a + 3;
}

What happens if a is greater than 2?

  • The language specification may require the compiler to fail
  • The language specification may require the compiler use the return function
  • The language specification may require the compiler to return

You can define a language which dosn't use keywords. You can even define a language which alowes you to replace all symbols (since they are only very short keywords themselfes).
The problem is not the compiler, if your specification is complete and error free it will work. The problem is PEBCAD, programs using this feature of the language will be hard to read as you have to keep track of the symbol definitions.



回答12:

FWIW, Tcl doesn't have any reserved words. You can have variables and functions named "if", "break", etc. The interpretation of a token is totally dependent on the context. The same token can represent a command in one context, a variable in another, or a literal string in another.



回答13:

In many cases, it would be possible for the compiler to interprete keywords as normal identifiers, like in your example:

int break = 1;
int for = 2;

As a matter of fact, I just wrote a compiler for a simple assembly-like toy language which does this, but warns the user in such cases.

But sometimes the syntax is defined in a way that keywords and identifiers are ambiguous:

int break;

while(...)
{
    break; // <-- treat this as expression or statement?
}

And the most obvious reason is that editors will emphasize keywords so that the code is more readable for humans. Allowing keywords to be treated as identifiers would make code highlighting harder, and would also lead to bad readability of your code.