This is a code from Linux man page:
#include <stdio.h>
#include <stdlib.h>
extern char etext, edata, end;
int main() {
printf("First address past:\n");
printf(" program text (etext) %10p\n", &etext);
printf(" initialized data (edata) %10p\n", &edata);
printf(" uninitialized data (end) %10p\n", &end);
exit(EXIT_SUCCESS);
}
when run, the program below produces output such as the following:
$ ./a.out
First address past:
program text (etext) 0x8048568
initialized data (edata) 0x804a01c
uninitialized data (end) 0x804a024
Where are etext
, edata
end
defined ? How those symbols are assigned values ? Is it by linker or something else ?
These symbols are defined in a linker script file.
Note that on Mac OS X, the code above may not work! Instead you can have:
#include <stdio.h>
#include <stdlib.h>
#include <mach-o/getsect.h>
int main(int argc, char *argv[])
{
printf(" program text (etext) %10p\n", (void*)get_etext());
printf(" initialized data (edata) %10p\n", (void*)get_edata());
printf(" uninitialized data (end) %10p\n", (void*)get_end());
exit(EXIT_SUCCESS);
}
Those symbols correspond to the beginnings of various program segments. They are set by the linker.
What GCC does
Expanding kgiannakakis a bit more.
Those symbols are defined by the PROVIDE
keyword of the linker script, documented at https://sourceware.org/binutils/docs-2.25/ld/PROVIDE.html#PROVIDE
The default scripts are generated when you build Binutils, and embedded into the ld
executable: external files that may be installed in your distribution like in /usr/lib/ldscripts
are not used by default.
Echo the linker script to be used:
ld -verbose | less
In binutils 2.24 it contains:
.text :
{
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf32.em. */
*(.gnu.warning)
}
.fini :
{
KEEP (*(SORT_NONE(.fini)))
}
PROVIDE (__etext = .);
PROVIDE (_etext = .);
PROVIDE (etext = .);
.rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
.rodata1 : { *(.rodata1) }
So we also discover that:
__etext
and _etext
will also work
etext
is not the end of the .text
section, but rather .fini
, which also contains code
etext
is not at the end of the segment, with .rodata
following it, since Binutils dumps all readonly sections into the same segment
PROVIDE
generates weak symbols: if you also define those symbols in your C code, your definition will win and hide this one.
Minimal Linux 32-bit example
To truly understand how things work, I like to create minimal examples!
main.S
:
.section .text
/* Exit system call. */
mov $1, %eax
/* Exit status. */
mov sdata, %ebx
int $0x80
.section .data
.byte 2
link.ld
:
SECTIONS
{
. = 0x400000;
.text :
{
*(.text)
sdata = .;
*(.data)
}
}
Compile and run:
gas --32 -o main.o main.S
ld -m elf_i386 -o main -T link.ld main.o
./main
echo $?
Output:
2
Explanation: sdata
points to the first byte of the start of the .data
section that follows.
So by controlling the first byte of that section, we control the exit status!
This example on GitHub.