Can a string literal and a non-string non-compound

2019-10-07 19:04发布

String literals are lvalues, which leaves the door open to modify string literals.

From C in a Nutshell:

In C source code, a literal is a token that denotes a fixed value, which may be an integer, a floating-point number, a character, or a string. A literal’s type is determined by its value and its notation. The literals discussed here are different from compound literals, which were introduced in the C99 standard. Compound literals are ordinary modifiable objects, similar to variables.

Although C does not strictly prohibit modifying string literals, you should not attempt to do so. For one thing, the compiler, treating the string literal as a constant, may place it in read-only memory, in which case the attempted write operation causes a fault. For another, if two or more identical string literals are used in the program, the compiler may store them at the same location, so that modifying one causes unexpected results when you access another.

  1. The first paragraph says that "a literal in C denotes a fixed value".

    • Does it mean that a literal (except compound literals) shouldn't be modified?

    • Since a string literal isn't a compound literal, should a string literal be modified?

  2. The second paragraph says that "C does not strictly prohibit modifying string literals" while compilers do. So should a string literal be modified?
  3. Do the two paragraphs contradict each other? How shall I understand them?

  4. Can a literal which is neither compound literal nor string literal be modified?

4条回答
对你真心纯属浪费
2楼-- · 2019-10-07 19:22

Don't modify string literals. Treat them as char const[]. String literals are effectively char const[] (modifying them results in undefined behavior), but for legacy reason they're really char [], which means the compiler won't stop you from writing into them, but your program will still go undefined if you do.

查看更多
爷的心禁止访问
3楼-- · 2019-10-07 19:32

From the C Standard (6.4.5 String literals)

7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

As for your statement.

The second paragraph says that "C does not strictly prohibit modifying string literals" while compilers do. So should a string literal be modified?

Then compilers do not modify string literals. They may store identical string literals as one array.

As @o11c pointed out in a comment in the Annex J (informative) Portability issues there is written

J.5 Common extensions

1 The following extensions are widely used in many systems, but are not portable to all implementations. The inclusion of any extension that may cause a strictly conforming program to become invalid renders an implementation nonconforming. Examples of such extensions are new keywords, extra library functions declared in standard headers, or predefined macros with names that do not begin with an underscore.

J.5.5 Writable string literals

1 String literals are modifiable (in which case, identical string literals should denote distinct objects) (6.4.5).

查看更多
ゆ 、 Hurt°
4楼-- · 2019-10-07 19:32

And saying more practically - not every hardware platfotm provides mechanisms to protect memory location where Read Only objects are stored. And it had to be defined as UB. There are 3 possible options:

  1. Literals (and constant objects more generally) are kept in the RAM but the hardware does not provide memory protection mechanisms. Nothing can stop the programmer from writing to this location

  2. Literals (and constant objects) are kept in the RAM but the hardware does provide memory protection mechanisms - you will get segfault

  3. Read Only data is stored in the read only memory (for example uC FLASH). You can try to write it but there is no effect of it (example ARM). No hardware exception raised

查看更多
时光不老,我们不散
5楼-- · 2019-10-07 19:33
  1. The first paragraph says that "a literal in C denotes a fixed value".
    • Does it mean that a literal (except compound literals) shouldn't be modified?

I don't know what the authors intention was, but modification of the array resulting from a string literal during runtime is blatantly undefined, according to C11/6.4.5p7: "If the program attempts to modify such an array, the behavior is undefined."

It should also be noted that attempts to modify a const-qualified compound literal during runtime will also result in undefined behavior, which is explained along-side some volatile-related undefined behaviour in C11/6.7.3p6. It is otherwise well defined to modify compound literals.

For example:

char *fubar = "hello world";
(*fubar)++; // SQUARELY UNDEFINED BEHAVIOUR!

char *fubar = (char[]){"hello world"};
(*fubar)++; // This is well defined.

Literally replacing "hello world" with "goodbye galaxy", in either piece of source code, is fine. Redefining standard functions, however (i.e. #define memcpy strncpy or #define size_t signed char, which are both great ways to ruin someone elses day), is undefined behaviour.

  • Since a string literal isn't a compound literal, should a string literal be modified?

The array resulting from a string literal should certainly not be modified during runtime, for any attempt to do so would trigger undefined behaviour.

The string literal itself, which exists as a quoted sequence of characters within your source code, on the other hand... of course, that can be modified as you choose. You're not obliged to modify it, though.

The second paragraph says that "C does not strictly prohibit modifying string literals" while compilers do. So should a string literal be modified?

The C standard doesn't strictly prohibit a lot of undefined behavior; it leaves the behavior undefined, meaning your program is likely to behave erratically or be non-portable. In the realms of well defined C, your programs should not invoke any undefined behaviour, including overflowing arrays, modifying const-qualified objects or the arrays resulting from string literals, race conditions caused by multithreading, etc.

If you want to invoke undefined behaviour, C will let you shoot yourself in the foot. You might have a good reason for doing so; perhaps your program will be more optimal, or perhaps your compiler actually lets you modify string literals ("it's a feature, not a bug", they say, "so give us your money", they say, as you become reliant upon their non-standard quirks). Be aware that some compilers will instead behave as though the attempted modification didn't occur, or crash, or there could be some vulnerability caused.

... and above all else, be aware that your code will no longer be compliant C code!

Do the two paragraphs contradict each other?

By omission, perhaps. The first paragraph does state that the values are fixed, and the second paragraph that the values might be modifiable during runtime through invocation of undefined behaviour.

I think the author meant to make the distinction between elements of source code and the runtime environment. He/she could simply clarify this by ensuring it's explicit that literals should not be modified during runtime, for example.

How shall I understand them?

In the realms of C such values can't change during runtime because invoking undefined behaviour means the code in question is no longer compliant C code.

Perhaps they were trying to avoid explaining undefined behaviour, because it may seem too complex to explain. If you look deeper into the subject, you'll find that the meaning is, as predicted, roughly a conjunction of the two words.

undefined: /ʌndɪˈfʌɪnd/ adj. not clear or defined. behaviour: /bɪˈheɪvjə/ noun. the way in which a machine or natural phenomenon works or functions

That is to say, an attempt to modify the array resulting from a string literal during runtime results in "unclear functionality". It's not required to be documented anywhere in the realms of computer science, and even if it is documented, that documentation might be a lie.

Can a literal which is neither compound literal nor string literal be modified?

As a lexical element in source code, providing it doesn't override a standard symbol, yes. Literals which aren't l-values (i.e. don't have any storage) such as integer constants, obviously can't be modified during runtime. I suppose it might be possible on some systems to attempt to modify the memory which a function pointer points at, which could be seen as a literal; that's also undefined behaviour and would result in code that isn't C.

It might also be possible to modify many other types of elements which aren't seen as objects by the C standard, such as the return address on the stack. That's what makes buffer overflows so subtly dangerous!

查看更多
登录 后发表回答