String literals: Where do they go?

2018-12-31 01:02发布

I am interested in where string literals get allocated/stored.

I did find one intriguing answer here, saying:

Defining a string inline actually embeds the data in the program itself and cannot be changed (some compilers allow this by a smart trick, don't bother).

But, it had to do with C++, not to mention that it says not to bother.

I am bothering. =D

So my question is where and how is my string literal kept? Why should I not try to alter it? Does the implementation vary by platform? Does anyone care to elaborate on the "smart trick?"

8条回答
若你有天会懂
2楼-- · 2018-12-31 01:14

String literals are frequently allocated to the read-only memory, making them immutable. However, in some compilers modification is possible by a "smart trick"..And the smart trick is by "using character pointer pointing to memory"..remember some compilers, may not allow this..Here is demo

char *tabHeader = "Sound";
*tabHeader = 'L';
printf("%s\n",tabHeader); // Displays "Lound"
查看更多
春风洒进眼中
3楼-- · 2018-12-31 01:20

As this might differ from compiler to compiler, the best way is to filter an object dump for the searched string literal:

objdump -s main.o | grep -B 1 str

where -s forces objdump to display the full contents of all sections, main.o is the object file, -B 1 forces grep to also print one line before the match (so that you can see the section name) and str is the string literal you're searching for.

With gcc on a Windows machine, and one variable declared in main like

char *c = "whatever";

running

objdump -s main.o | grep -B 1 whatever

returns

Contents of section .rdata:
 0000 77686174 65766572 00000000           whatever....
查看更多
笑指拈花
4楼-- · 2018-12-31 01:21

There is no one answer to this. The C and C++ standards just say that string literals have static storage duration, any attempt at modifying them gives undefined behavior, and multiple string literals with the same contents may or may not share the same storage.

Depending on the system you're writing for, and the capabilities of the executable file format it uses, they may be stored along with the program code in the text segment, or they may have a separate segment for initialized data.

Determining the details will vary depending on the platform as well -- most probably include tools that can tell you where it's putting it. Some will even give you control over details like that, if you want it (e.g. gnu ld allows you to supply a script to tell it all about how to group data, code, etc.)

查看更多
孤独寂梦人
5楼-- · 2018-12-31 01:24

gcc makes a .rodata section that gets mapped "somewhere" in address space and is marked read only,

Visual C++ (cl.exe) makes a .rdata section for the same purpose.

You can look at the output from dumpbin or objdump (on Linux) to see the sections of your executable.

E.g.

>dumpbin vec1.exe
Microsoft (R) COFF/PE Dumper Version 8.00.50727.762
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file vec1.exe

File Type: EXECUTABLE IMAGE

  Summary

        4000 .data
        5000 .rdata  <-- here are strings and other read-only stuff.
       14000 .text
查看更多
一个人的天荒地老
6楼-- · 2018-12-31 01:28

It depends on the format of your executable. One way to think about it is that if you were assembly programming, you might put string literals in the data segment of your assembly program. Your C compiler does something like that, but it all depends on what system you're binary is being compiled for.

查看更多
人气声优
7楼-- · 2018-12-31 01:30

Why should I not try to alter it?

Because it is undefined behavior. Quote from C99 N1256 draft 6.7.8/32 "Initialization":

EXAMPLE 8: The declaration

char s[] = "abc", t[3] = "abc";

defines "plain" char array objects s and t whose elements are initialized with character string literals.

This declaration is identical to

char s[] = { 'a', 'b', 'c', '\0' },
t[] = { 'a', 'b', 'c' };

The contents of the arrays are modifiable. On the other hand, the declaration

char *p = "abc";

defines p with type "pointer to char" and initializes it to point to an object with type "array of char" with length 4 whose elements are initialized with a character string literal. If an attempt is made to use p to modify the contents of the array, the behavior is undefined.

Where do they go?

GCC 4.8 x86-64 ELF Ubuntu 14.04:

  • char s[]: stack
  • char *s:
    • .rodata section of the object file
    • the same segment where the .text section of the object file gets dumped, which has Read and Exec permissions, but not Write

Program:

#include <stdio.h>

int main() {
    char *s = "abc";
    printf("%s\n", s);
    return 0;
}

Compile and decompile:

gcc -ggdb -std=c99 -c main.c
objdump -Sr main.o

Output contains:

 char *s = "abc";
8:  48 c7 45 f8 00 00 00    movq   $0x0,-0x8(%rbp)
f:  00 
        c: R_X86_64_32S .rodata

So the string is stored in the .rodata section.

Then:

readelf -l a.out

Contains (simplified):

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000704 0x0000000000000704  R E    200000

 Section to Segment mapping:
  Segment Sections...
   02     .text .rodata

This means that the default linker script dumps both .text and .rodata into a segment that can be executed but not modified (Flags = R E). Attempting to modify such a segment leads to a segfault in Linux.

If we do the same for char[]:

 char s[] = "abc";

we obtain:

17:   c7 45 f0 61 62 63 00    movl   $0x636261,-0x10(%rbp)

so it gets stored in the stack (relative to %rbp), and we can of course modify it.

查看更多
登录 后发表回答