Why do compilers allow string literals not to be c

2019-09-18 16:39发布

问题:

And where are literals in memory exactly? (see examples below)

I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.

Whereas an implicit cast of a const char* type to a char* type gives me a warning, see below (tested on GCC, but it behaves similarly on VC++2010).

Also, if I modify the value of a const char (with a trick below where GCC would better give me a warning for), it gives no error and I can even modify and display it on GCC (even though I guess it is still an undefined behavior, I wonder why it did not do the same with the literal). That is why I am asking where those literal are stored, and where are more common const supposedly stored?

const char* a = "test";
char* b = a; /* warning: initialization discards qualifiers 
  from pointer target type (on gcc), error on VC++2k10 */

char *c = "test"; // no compile errors
c[0] = 'p'; /* bus error when execution (we are not supposed to 
  modify const anyway, so why can I and with no errors? And where is the 
  literal stored for I have a "bus error"? 
  I have 'access violation writing' on VC++2010 */

const char d = 'a';
*(char*)&d = 'b'; // no warnings (why not?)
printf("%c", d);  /* displays 'b' (why doesn't it do the same
  behavior as modifying a literal? It displays 'a' on VC++2010 */

回答1:

The C standard does not forbid the modification of string literals. It just says that the behaviour is undefined if the attempt is made. According to the C99 rationale, there were people in the committee who wanted string literals to be modifiable, so the standard does not explicitly forbid it.

Note that the situation is different in C++. In C++, string literals are arrays of const char. However, C++ allows conversions from const char * to char *. That feature has been deprecated, though.



回答2:

Mostly historical reasons. But keep in mind that they are somewhat justified: String literals don't have type char *, but char [N] where N denotes the size of the buffer (otherwise, sizeof wouldn't work as expected on string literals) and can be used to initialize non-const arrays. You can only assign them to const pointers because of the implicit conversions of arrays to pointers and non-const to const.

It would be more consistent if string literals exhibited the same behaviour as compound literals, but as these are a C99 construct and backwards-compatibility had to be maintained, this wasn't an option, so string literals stay an exceptional case.



回答3:

And where are literals in memory exactly? (see examples below)

Initialized data segment. On Linux it is either .data or .rodata.

I cannot modify a literal, so it would supposedly be a const char*, although the compiler let me use a char* for it, I have no warnings even with most of the compiler flags.

Historical as it was already explained by others. Most compilers allow you tell whether the string literals should be read-only or modifiable with a command line option.

The reason it is generally desired to have string literals read-only is that the segment with read-only data in memory can be (and normally is) shared between all the processes started from the executable. That obviously frees some RAM from being wasted to keep redundant copies of the same information.



回答4:

I'm not certain about what C/C++ standards stand for about strings. But I can tell exactly what actually happens with string literals in MSVC. And, I believe, other compilers behave similarly.

String literals reside in a const data section. Their memory is mapped into the process address space. However the memory pages they're stored in are ead-only (unless explicitly modified during the run).

But there's something more you should know. Not all the C/C++ expressions containing quotes have the same meaning. Let's clarify everything.

const char* a = "test";

The above statement makes the compiler create a string literal "test". The linker makes sure it'll be in the executable file. In the function body the compiler generates a code that declares a variable a on the stack, which gets initialized by the address of the string literal "test.

char* b = a;

Here you declare another variable b on the stack which gets the value of a. Since a pointed to a read-only address - so would b. The even fact b has no const semantics doesn't mean you may modify what it points on.

char *c = "test"; // no compile errors
c[0] = 'p';

The above generates an access violation. Again, the lack of const doesn't mean anything at the machine level

const char d = 'a';
*(char*)&d = 'b';

First of all - the above is not related to string literals. 'a' is not a string. It's a character. It's just a number. It's like writing the following:

const int d = 55;
*(int*)&d = 56;

The above code makes a fool out of compiler. You say the variable is const, however you manage to modify it. But this is not related to the processor exception, since d resides in the read/write memory nevertheless.

I'd like to add one more case:

char b[] = "test";
b[2] = 'o';

The above declares an array on the stack, and initializes it with the string "test". It resides in the read/write memory, and can be modified. There's no problem here.



回答5:

I have no warnings even with most of the compiler flags

Really? When I compile the following code snippet:

int main()
{
    char* p = "some literal";
}

on g++ 4.5.0 even without any flags, I get the following warning:

warning: deprecated conversion from string constant to 'char*'



回答6:

You can write to c because you didn't make it const. Defining c as const would be correct practice since the right hand side has type const char*.

It generates an error at runtime because the "test" value is probably allocated to the code segment which is read-only. See here and here.