I am interested in where string literals get allocated/stored.
I did find one intriguing answer here, saying:
Defining a string inline actually embeds the data in the program itself and cannot be changed (some compilers allow this by a smart trick, don't bother).
But, it had to do with C++, not to mention that it says not to bother.
I am bothering. =D
So my question is where and how is my string literal kept? Why should I not try to alter it? Does the implementation vary by platform? Does anyone care to elaborate on the "smart trick?"
String literals are frequently allocated to the read-only memory, making them immutable. However, in some compilers modification is possible by a "smart trick"..And the smart trick is by "using character pointer pointing to memory"..remember some compilers, may not allow this..Here is demo
As this might differ from compiler to compiler, the best way is to filter an object dump for the searched string literal:
where
-s
forcesobjdump
to display the full contents of all sections,main.o
is the object file,-B 1
forcesgrep
to also print one line before the match (so that you can see the section name) andstr
is the string literal you're searching for.With gcc on a Windows machine, and one variable declared in
main
likerunning
returns
There is no one answer to this. The C and C++ standards just say that string literals have static storage duration, any attempt at modifying them gives undefined behavior, and multiple string literals with the same contents may or may not share the same storage.
Depending on the system you're writing for, and the capabilities of the executable file format it uses, they may be stored along with the program code in the text segment, or they may have a separate segment for initialized data.
Determining the details will vary depending on the platform as well -- most probably include tools that can tell you where it's putting it. Some will even give you control over details like that, if you want it (e.g. gnu ld allows you to supply a script to tell it all about how to group data, code, etc.)
gcc makes a
.rodata
section that gets mapped "somewhere" in address space and is marked read only,Visual C++ (
cl.exe
) makes a.rdata
section for the same purpose.You can look at the output from
dumpbin
orobjdump
(on Linux) to see the sections of your executable.E.g.
It depends on the format of your executable. One way to think about it is that if you were assembly programming, you might put string literals in the data segment of your assembly program. Your C compiler does something like that, but it all depends on what system you're binary is being compiled for.
Why should I not try to alter it?
Because it is undefined behavior. Quote from C99 N1256 draft 6.7.8/32 "Initialization":
Where do they go?
GCC 4.8 x86-64 ELF Ubuntu 14.04:
char s[]
: stackchar *s
:.rodata
section of the object file.text
section of the object file gets dumped, which has Read and Exec permissions, but not WriteProgram:
Compile and decompile:
Output contains:
So the string is stored in the
.rodata
section.Then:
Contains (simplified):
This means that the default linker script dumps both
.text
and.rodata
into a segment that can be executed but not modified (Flags = R E
). Attempting to modify such a segment leads to a segfault in Linux.If we do the same for
char[]
:we obtain:
so it gets stored in the stack (relative to
%rbp
), and we can of course modify it.