Why a segmentation fault for changing a non-const

2019-07-15 06:31发布

问题:

With this code, I get a segmentation fault:

   char* inputStr = "abcde";
   *(inputStr+1)='f';

If the code was:

   const char* inputStr = "abcde";
   *(inputStr+1)='f';

I will get compile error for "assigning read-only location". However, for the first case, there is no compile error; just the segmentation fault when the assign operation actually happened.

Can anyone explain this?

回答1:

Here is what the standard says about string literals in section [2.13.4/2]:

A string literal that does not begin with u, U, or L is an ordinary string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n const char”, where n is the size of the string as defined below; it has static storage duration (3.7) and is initialized with the given characters.

So, strictly speaking, "abcde" has type

const char[6]

Now what happens in your code is an implicit cast to

char*

so that the assignment is allowed. The reason why it is so is, likely, compatibility with C. Have a look also at the discussion here: http://learningcppisfun.blogspot.com/2009/07/string-literals-in-c.html

Once the cast is done, you are syntactically free to modify the literal, but it fails because the compiler stores the literal in a non writable segment of memory, as the standard itself allow.



回答2:

This gets created in the code segment:

char *a = "abcde";

Essentially it's const.

If you wish to edit it, try:

char a[] = "abcde";


回答3:

The standard states that you are not allowed to modify string literals directly, regardless of whether you mark them const or not:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.

In fact, in C (unlike C++), string literals are not const but you're still not allowed to write to them.

This restriction on writing allows certain optimisations to take place, such as sharing of literals along the lines of:

char *ermsg = "invalid option";
char *okmsg =   "valid option";

where okmsg can actually point to the 'v' character in ermsg, rather than being a distinct string.



回答4:

String literals are typically stored in read-only memory. Trying to change this memory will kill your program.

Here's a good explanation: Is a string literal in c++ created in static memory?



回答5:

It is mostly ancient history; once upon a long time ago, string literals were not constant.

However, most modern compilers place string literals into read-only memory (typically, the text segment of your program, where your code also lives), and any attempt to change a string literal will yield a core dump or equivalent.

With G++, you can most certainly get the compilation warning (-Wall if it is not enabled by default). For example, G++ 4.6.0 compiled on MacOS X 10.6.7 (but running on 10.7) yields:

$ cat xx.cpp
int main()
{
    char* inputStr = "abcde";
   *(inputStr+1)='f';
}
$ g++ -c xx.cpp
xx.cpp: In function ‘int main()’:
xx.cpp:3:22: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
$

So the warning is enabled by default.



回答6:

What happened is that the compiler put the constant "abcde" in some read-only memory segment. You pointed your (non-const) char* inputStr at that constant, and kaboom, segfault.

Lesson to be learned: Don't invoke undefined behavior.

Edit (elaboration)

However, for the first case, there is no compile error, just segmentation fault when the assign operation actually happened.

You need to enabled your compiler warnings. Always set your compiler warnings as high as possible.



回答7:

Even though "abcde" is a string literal, which should not be modified, you've told the compiler that you don't care about that by having a non-const char* point to it.

The compiler will happily assume that you know what you're doing, and not throw an error. However, there's a good chance that the code will fail at runtime when you do indeed try to modify the string literal.



回答8:

String literals, while officially non-const, are almost always stored in read-only memory. In your setup, this is apparently only the case if it is declared as const char array.

Note that the standard forbids you to modify any string literal.



回答9:

a little bit of history of string literals in Ritchie's words. mostly about the orgin and the evolution of string literals from K&R 1. Hope this might clarify a thing or two about const and string literals.

"From: Dennis Ritchie Subject: Re: History question: String literals. Date: 02 Jun 1998 Newsgroups: comp.std.c

At the time that the C89 committee was working, writable string literals weren't "legacy code" (Margolin) and what standard there existed (K&R 1) was quite explicit (A.2.5) that strings were just a way of initializing a static array. And as Barry pointed out there were some (mktemp) routines that used this fact.

I wasn't around for the committee's deliberations on the point, but I suspect that the BSD utility for fiddling the assembler code to move the initialization of strings to text instead of data, and the realization that most literal strings were not in fact overwritten, was more important than some very early version of gcc.

Where I think the committee might have missed something is in failure to find a formulation that explained the behavior of string literals in terms of const. That is, if "abc" is an anonymous literal of type const char [4] then just about all of its properties (including the ability to make read-only, and even to share its storage with other occurrences of the same literal) are nearly explained.

The problem with this was not only the relatively few places that string literals were actually written on, but much more important, working out feasible rules for assignments to pointers-to-const, in particular for function's actual arguments. Realistically the committee knew that whatever rules they formulated could not require a mandatory diagnostic for every func("string") in the existing world.

So they decided to leave "..." of ordinary char array type, but say one was required not to write over it.

This note, BTW, isn't intended to be read as a snipe at the formulation in C89. It is very hard to get things both right (coherent and correct) and usable (consistent enough, attractive enough).

Dennis

"