Why do some languages need semicolons?

2019-02-12 00:05发布

问题:

I understand that semicolons indicate the end of a line in languages like Java, but why?

I get asked this a lot by other people, and I can't really think of a good way to explain how it works better than just using line breaks or white space.

回答1:

They don't signal end of line, they signal end of statement.

There are some languages that don't require them, but those languages don't allow multiple statements on a single line or a single statement to span multipile lines (without some other signal like VB's _ signal).

Why do some languages allow multiple statements on a line? The philosophy is that whitespace is irrelevant (an end of line character is whitespace). This allows flexibility in how the code is formatted as formatting is not part of the semantic meaning.



回答2:

First of all, the semicolon is a statement separator, not a line separator. Some languages use the new line character as statement separator, but languages which ignore all whitespace tend to use the semicolon.

Why do languages ignore whitespace?

A language ignores whitespace to allow the programmer to format the source code as he likes it. For example, in Java there is no difference between

if (welcome)
    System.out.println("hello world");

and

if (welcome) System.out.println("hello world");

This is not because there is one separate case for each of these in the grammar of the language, but because the whitespace is simply ignored.

Why does a programming language need a statement separator?

This is the core of the question. To understand it, let's consider a small language without any statement separator. It contains the following statement types:

var x = foo()
y[0, 1] = x
bar()

Here, y is a two-dimensional array and x is written to one of the entries of y.

Now lets look at these statements like the compiler would see them:

var x = foo() y[0, 1] = x bar()

Because there is no statement separator, the compiler has to recognize the end of each statement by itself, to make sense of the input. Is the compiler able to do so? I guess in the above example the compiler can do it.

Now, lets add another type of statement to out language:

[x, y] = ["hello", "world"]

The multi assignment allows the programmer to assign multiple values at once. After this line, the variable x will contain the value "hello" while the variable y contains "world". This might be really handy to allow multiple return values from a function. Now how does this work together with the remaining statement types?

Consider the following sequence of statements:

foo()
[x, y] = [1, 2]

First, we call the method foo. Afterwards, we assign 1 to x and 2 to y. At least this is what we meant to do. Here is what the compiler sees:

foo() [x, y] = [1, 2]

Is the compiler able to recognize each statement? No. There are at least two possible interpretations. The first is the one we intended. Here is the second one:

foo()[x, y] = [1, 2]

What does this mean? First, we call the method foo. This method is supposed to return a two-dimensional array. Now, we write the array [1, 2] at the position [x, y] in the returned array.

The compiler cannot recognize the statements, since there are at least two valid interpretations of the given input. Of course, this should never happen in a real programming language. In the given example, it might be easy to resolve, but the point is that it is hard to design a programming language without a statement separator to be not ambiguous. It is hard, because the language designer has to consider all possible permutations of statement types to be sure the language is not ambiguous.

Thus, the statement separator helps the language designer to initially design the language, but more importantly it allows the language designer to easily extend the language in the future, for example by adding new statement types. This is a big thing, since once code is written in your language, you cannot simply change the grammar for existing statement types, because this will cause all the existing code to not compile anymore.

TL;DR

Summing it all up, the semicolon was introduced as statement separator in whitespace ignoring languages, because it is easier to design and extend a language which has a statement separator.



回答3:

Many languages allow you to put as much spacing as you like. This allows you to be have control over how the code looks.

Consider:

 String result = "asdfsasdfs"
               + "asdfs"
               + "asdfsdf";

Because you are allowed to insert extra newlines you can split that line across several lines without problem. The language still needs to know the line is finished that is why you need a semicolon.



回答4:

Some programming languages use it to signify the end of a statement thus making the language oblivious to white-space from a statement standpoint. One thing to bear in mid is that if at compile time you are checking for either a new line or a semicolon and then you have to asses several different "situations" the compiler might get what you wanted to do wrong, and it would take a it longer to look for those situations rather than simply looking for a semicolon at the end of the statement. Some higher level languages try to reduce semicolon use or remove it altogether in order to save a few keystrokes, this languages are more oriented toward the comfort of the programmer and generally come with all sort of syntactic sugar; one could argue that not using semicolons is a kind of syntactic sugar. The use or not of a semicolon in a language should be in according to what the language is trying to accomplish, Languages like C and C++ are mostly about performance, Java and C# are a bit higher in the abstraction sense than C and C++ and then we have things like Scala, Python and Ruby, which are made mostly to make programming more comfortable a the cost of performance,(Ruby openly admits this, and it's very pronounced on Python). So why do some languages "need" semicolons?

  • Makes compiling easier
  • The designer of the language thinks it's more consistent
  • Historical reasons (Java, C# and C++ are also C's children for example)

and one last thing is that Javascript actually adds the semicolons during compile or before IIRC, so it's not actually semicolon free.



回答5:

Short answer:

Because everyone else does it.

In theory a language's statement is whatever the language designer is able to syntactically interpret when they parse your file. So if the language designer did not want to have semicolons they could have periods, dashes, spaces, newlines, or whatever to denote the separation of a statement.

Language designers often make the syntax easy to understand so that it can become popular.

Wikipedia: Semicolon Usage in Computer Languages

So if some language designer created a language that used ':-)' to denote the end of a statement it would, 1) be hard to read; 2) not be popular with people who already are used to using a ';'.

echo "Take Care" :-)



回答6:

The languages do it, as it signifies the end of a statement, not an end of the line, which means that you can compress code, to make it smaller and take up less space.

Take the C++ code (#include <iostream>):

for(int i = 0; i < 5; ++i){
    std::cout << "did you know?" << std::endl; 
    std::cout << "; signifies **end of statement**" << std::endl;
    std::cout << "**not the end of the line**" << std::endl;
}

It could also be written

for(int i = 0; i < 5; ++i){std::cout << "did you know?" << std::endl; std::cout << "; signifies **end of statement**" << std::endl; std::cout << "**not the end of the line**" << std::endl;}


回答7:

Short answer:

Because everyone else does it.

Not, nor everyone. Furthermore, many popular languages like Python, Ruby, or Visual Basic, don't use semicolon as end of statement but line breaks. Many, not "everyone", still uses semicolon because historical reasons, not rational argumentation: semicolons had a important role to replace the punched-card format in first age of computation, but today it can be totally discarded.

In fact, there're two popular ways of specify an end of statement:

  1. Using a semicolon.
  2. Leaving as is. This makes the compiler read a line break as end of statement. When you want extend your statement to more of one line, you simply use a special character (like \ in Python) to say that the statement has not finished.

In order to make a code more readable, using a special character to specify an end of statement should be an exception, not the rule.