Compilation of string literals

2019-02-18 10:36发布

Why can two string literals separated by a space, tab or "\n" be compiled without an error?

int main()
{
   char * a = "aaaa"  "bbbb";
} 

"aaaa" is a char* "bbbb" is a char*

There is no specific concatenation rule to process two string literals. And obviously the following code gives an error during compilation:

#include <iostream>
int main()
{
   char * a = "aaaa";
   char * b = "bbbb";
   std::cout << a b;
}

Is this concatenation common to all compilers? Where is the null termination of "aaaa"? Is "aaaabbbb" a continuous block of RAM?

5条回答
相关推荐>>
2楼-- · 2019-02-18 11:29

If you see e.g. this translation phase reference in phase 6 it does:

Adjacent string literals are concatenated.

And that's exactly what happens here. You have two adjacent string literals, and they are concatenated into a single string literal.

It is standard behavior.

It only works for string literals, not two pointer variables, as you noticed.

查看更多
我命由我不由天
3楼-- · 2019-02-18 11:30

In this statement

char * a = "aaaa"  "bbbb";

the compiler in some step of compilation before the syntax analysis considers adjacent string literals as one literal.

So for the compiler the above statement is equivalent to

char * a = "aaaabbbb";

that is the compiler stores only one string literal "aaaabbbb"

查看更多
霸刀☆藐视天下
4楼-- · 2019-02-18 11:30

Adjacent string literals are concatenated as per the rules of C (and C++) standard. But no such rule exists for adjacent identifiers (i.e. variables a and b).

To quote, C++14 (N3797 draft), § 2.14.5:

In translation phase 6 (2.2), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally-supported with implementation-defined behavior.

查看更多
我欲成王,谁敢阻挡
5楼-- · 2019-02-18 11:32

String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

(source)

查看更多
别忘想泡老子
6楼-- · 2019-02-18 11:34

In C and C++ compiles adjacent string literals as a single string literal. For example this:

"Some text..." "and more text"

is equivalent to:

"Some text...and more text"

That for historical reasons:

The original C language was designed in 1969-1972 when computing was still dominated by the 80 column punched card. Its designers used 80 column devices such as the ASR-33 Teletype. These devices did not automatically wrap text, so there was a real incentive to keep source code within 80 columns. Fortran and Cobol had explicit continuation mechanisms to do so, before they finally moved to free format.

It was a stroke of brilliance for Dennis Ritchie (I assume) to realise that there was no ambiguity in the grammar and that long ASCII strings could be made to fit into 80 columns by the simple expedient of getting the compiler to concatenate adjacent literal strings. Countless C programmers were grateful for that small feature.

Once the feature is in, why would it ever be removed? It causes no grief and is frequently handy. I for one wish more languages had it. The modern trend is to have extended strings with triple quotes or other symbols, but the simplicity of this feature in C has never been outdone.

Similar question here.

查看更多
登录 后发表回答