Just curious. This obviously isn't a very good solution for actual programming, but say I wanted to make an executable in Bless (a hex editor).
My architecture is x86. What's a very simple program I can make? A hello world? An infinite loop? Similar to this question, but in Linux.
As mentioned in my comment, you will essentially be writing your own elf-header for the executable eliminating the unneeded sections. There are still several required sections. The documentation at Muppetlabs-TinyPrograms does a fair job explaining this process. For fun, here are a couple of examples:
The equivalent of /bin/true (45 bytes):
Your classic 'Hello World!' (160 bytes):
Don't forget to make them executable...
Decompile a NASM hello world and understand every byte in it
Version of this answer with a nice TOC and more content: http://www.cirosantilli.com/elf-hello-world (hitting the 30k char limit here)
Standards
ELF is specified by the LSB:
The LSB basically links to other standards with minor extensions, in particular:
generic (both by SCO):
architecture specific:
A handy summary can be found at:
Its structure can be examined in a human readable way via utilities like
readelf
andobjdump
.Generate the example
Let's break down a minimal runnable Linux x86-64 example:
Compiled with:
Versions:
ld
)We don't use a C program as that would complicate the analysis, that will be level 2 :-)
Hexdumps
Output at: https://gist.github.com/cirosantilli/7b03f6df2d404c0862c6
Global file structure
An ELF file contains the following parts:
ELF header. Points to the position of the section header table and the program header table.
Section header table (optional on executable). Each has
e_shnum
section headers, each pointing to the position of a section.N sections, with
N <= e_shnum
(optional on executable)Program header table (only on executable). Each has
e_phnum
program headers, each pointing to the position of a segment.N segments, with
N <= e_phnum
(optional on executable)The order of those parts is not fixed: the only fixed thing is the ELF header that must be the first thing on the file: Generic docs say:
ELF header
The easiest way to observe the header is:
Output at: https://gist.github.com/cirosantilli/7b03f6df2d404c0862c6
Bytes in the object file:
Executable:
Structure represented:
Manual breakdown:
0 0:
EI_MAG
=7f 45 4c 46
=0x7f 'E', 'L', 'F'
: ELF magic number0 4:
EI_CLASS
=02
=ELFCLASS64
: 64 bit elf0 5:
EI_DATA
=01
=ELFDATA2LSB
: big endian data0 6:
EI_VERSION
=01
: format version0 7:
EI_OSABI
(only in 2003 Update) =00
=ELFOSABI_NONE
: no extensions.0 8:
EI_PAD
= 8x00
: reserved bytes. Must be set to 0.1 0:
e_type
=01 00
= 1 (big endian) =ET_REl
: relocatable formatOn the executable it is
02 00
forET_EXEC
.1 2:
e_machine
=3e 00
=62
=EM_X86_64
: AMD64 architecture1 4:
e_version
=01 00 00 00
: must be 11 8:
e_entry
= 8x00
: execution address entry point, or 0 if not applicable like for the object file since there is no entry point.On the executable, it is
b0 00 40 00 00 00 00 00
. TODO: what else can we set this to? The kernel seems to put the IP directly on that value, it is not hardcoded.2 0:
e_phoff
= 8x00
: program header table offset, 0 if not present.40 00 00 00
on the executable, i.e. it starts immediately after the ELF header.2 8:
e_shoff
=40
7x00
=0x40
: section header table file offset, 0 if not present.3 0:
e_flags
=00 00 00 00
TODO. Arch specific.3 4:
e_ehsize
=40 00
: size of this elf header. TODO why this field? How can it vary?3 6:
e_phentsize
=00 00
: size of each program header, 0 if not present.38 00
on executable: it is 56 bytes long3 8:
e_phnum
=00 00
: number of program header entries, 0 if not present.02 00
on executable: there are 2 entries.3 A:
e_shentsize
ande_shnum
=40 00 07 00
: section header size and number of entries3 E:
e_shstrndx
(Section Header STRing iNDeX
) =03 00
: index of the.shstrtab
section.Section header table
Array of
Elf64_Shdr
structs.Each entry contains metadata about a given section.
e_shoff
of the ELF header gives the starting position, 0x40 here.e_shentsize
ande_shnum
from the ELF header say that we have 7 entries, each0x40
bytes long.So the table takes bytes from 0x40 to
0x40 + 7 + 0x40 - 1
= 0x1FF.Some section names are reserved for certain section types: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections e.g.
.text
requires aSHT_PROGBITS
type andSHF_ALLOC
+SHF_EXECINSTR
readelf -S hello_world.o
:struct
represented by each entry:Sections
Index 0 section
Contained in bytes 0x40 to 0x7F.
The first section is always magic: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html says:
There are also other magic sections detailed in
SHT_NULLFigure 4-7: Special Section Indexes
.In index 0,
SHT_NULL
is mandatory. Are there any other uses for it: What is the use of the SHT_NULL section in ELF? ?.data section
.data
is section 1:80 0:
sh_name
=01 00 00 00
: index 1 in the.shstrtab
string tableHere,
1
says the name of this section starts at the first character of that section, and ends at the first NUL character, making up the string.data
..data
is one of the section names which has a predefined meaning http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html80 4:
sh_type
=01 00 00 00
:SHT_PROGBITS
: the section content is not specified by ELF, only by how the program interprets it. Normal since a.data
section.80 8:
sh_flags
=03
7x00
:SHF_ALLOC
andSHF_EXECINSTR
: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#sh_flags, as required from a.data
section90 0:
sh_addr
= 8x00
: in what virtual address the section will be placed during execution,0
if not placed90 8:
sh_offset
=00 02 00 00 00 00 00 00
=0x200
: number of bytes from the start of the program to the first byte in this sectiona0 0:
sh_size
=0d 00 00 00 00 00 00 00
If we take 0xD bytes starting at
sh_offset
200, we see:AHA! So our
"Hello world!"
string is in the data section like we told it to be on the NASM.Once we graduate from
hd
, we will look this up like:which outputs:
NASM sets decent properties for that section because it treats
.data
magically: http://www.nasm.us/doc/nasmdoc7.html#section-7.9.2Also note that this was a bad section choice: a good C compiler would put the string in
.rodata
instead, because it is read-only and it would allow for further OS optimizations.a0 8:
sh_link
andsh_info
= 8x 0: do not apply to this section type. http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sectionsb0 0:
sh_addralign
=04
= TODO: why is this alignment necessary? Is it only forsh_addr
, or also for symbols insidesh_addr
?b0 8:
sh_entsize
=00
= the section does not contain a table. If != 0, it means that the section contains a table of fixed size entries. In this file, we see from thereadelf
output that this is the case for the.symtab
and.rela.text
sections..text section
Now that we've done one section manually, let's graduate and use the
readelf -S
of the other sections..text
is executable but not writable: if we try to write to it Linux segfaults. Let's see if we really have some code there:gives:
If we grep
b8 01 00 00
on thehd
, we see that this only occurs at00000210
, which is what the section says. And the Size is 27, which matches as well. So we must be talking about the right section.This looks like the right code: a
write
followed by anexit
.The most interesting part is line
a
which does:to pass the address of the string to the system call. Currently, the
0x0
is just a placeholder. After linking happens, it will be modified to contain:This modification is possible because of the data of the
.rela.text
section.SHT_STRTAB
Sections with
sh_type == SHT_STRTAB
are called string tables.They hold a null separated array of strings.
Such sections are used by other sections when string names are to be used. The using section says:
So for example, we could have a string table containing: TODO: does it have to start with
\0
?And if another section wants to use the string
d e f
, they have to point to index5
of this section (letterd
).Notable string table sections:
.shstrtab
.strtab
.shstrtab
Section type:
sh_type == SHT_STRTAB
.Common name: section header string table.
The section name
.shstrtab
is reserved. The standard says:This section gets pointed to by the
e_shstrnd
field of the ELF header itself.String indexes of this section are are pointed to by the
sh_name
field of section headers, which denote strings.This section does not have
SHF_ALLOC
marked, so it will not appear on the executing program.Gives:
The data in this section has a fixed format: http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html
If we look at the names of other sections, we see that they all contain numbers, e.g. the
.text
section is number7
.Then each string ends when the first NUL character is found, e.g. character
12
is\0
just after.text\0
..symtab
Section type:
sh_type == SHT_SYMTAB
.Common name: symbol table.
First the we note that:
sh_link
=5
sh_info
=6
For
SHT_SYMTAB
sections, those numbers mean that:.strtab
.rela.text
A good high level tool to disassemble that section is:
which gives:
This is however a high level view that omits some types of symbols and in which the symbol types . A more detailed disassembly can be obtained with:
which gives:
The binary format of the table is documented at http://www.sco.com/developers/gabi/2003-12-17/ch4.symtab.html
The data is:
Which gives:
The entries are of type:
Like in the section table, the first entry is magical and set to a fixed meaningless values.
STT_FILEEntry 1 has
ELF64_R_TYPE == STT_FILE
.ELF64_R_TYPE
is continued inside ofst_info
.Byte analysis:
10 8:
st_name
=01000000
= character 1 in the.strtab
, which until the following\0
makeshello_world.asm
This piece of information file may be used by the linker to decide on which segment sections go.
10 12:
st_info
=04
Bits 0-3 =
ELF64_R_TYPE
= Type =4
=STT_FILE
: the main purpose of this entry is to usest_name
to indicate the name of the file which generated this object file.Bits 4-7 =
ELF64_ST_BIND
= Binding =0
=STB_LOCAL
. Required value forSTT_FILE
.10 13:
st_shndx
= Symbol Table Section header Index =f1ff
=SHN_ABS
. Required forSTT_FILE
.20 0:
st_value
= 8x00
: required for value forSTT_FILE
20 8:
st_size
= 8x00
: no allocated sizeNow from the
STT_SECTIONreadelf
, we interpret the others quickly.There are two such entries, one pointing to
.data
and the other to.text
(section indexes1
and2
).TODO what is their purpose?
STT_NOTYPEThen come the most important symbols:
hello_world
string is in the.data
section (index 1). It's value is 0: it points to the first byte of that section._start
is marked withGLOBAL
visibility since we wrote:in NASM. This is necessary since it must be seen as the entry point. Unlike in C, by default NASM labels are local.
SHN_ABShello_world_len
points to the specialst_shndx == SHN_ABS == 0xF1FF
.0xF1FF
is chosen so as to not conflict with other sections.st_value == 0xD == 13
which is the value we have stored there on the assembly: the length of the stringHello World!
.This means that relocation will not affect this value: it is a constant.
This is small optimization that our assembler does for us and which has ELF support.
If we had used the address of
SHT_SYMTAB on the executablehello_world_len
anywhere, the assembler would not have been able to mark it asSHN_ABS
, and the linker would have extra relocation work on it later.By default, NASM places a
.symtab
on the executable as well.This is only used for debugging. Without the symbols, we are completely blind, and must reverse engineer everything.
You can strip it with
objcopy
, and the executable will still run. Such executables are called stripped executables..strtab
Holds strings for the symbol table.
This section has
sh_type == SHT_STRTAB
.It is pointed to by
sh_link == 5
of the.symtab
section.Gives:
This implies that it is an ELF level limitation that global variables cannot contain NUL characters.
.rela.text
Section type:
sh_type == SHT_RELA
.Common name: relocation section.
.rela.text
holds relocation data which says how the address should be modified when the final executable is linked. This points to bytes of the text area that must be modified when linking happens to point to the correct memory locations.Basically, it translates the object text containing the placeholder 0x0 address:
to the actual executable code containing the final 0x6000d8:
It was pointed to by
sh_info
=6
of the.symtab
section.readelf -r hello_world.o
gives:The section does not exist in the executable.
The actual bytes are:
The
struct
represented is:So:
370 0:
r_offset
= 0xC: address into the.text
whose address this relocation will modify370 8:
r_info
= 0x200000001. Contains 2 fields:ELF64_R_TYPE
= 0x1: meaning depends on the exact architecture.ELF64_R_SYM
= 0x2: index of the section to which the address points, so.data
which is at index 2.The AMD64 ABI says that type
1
is calledR_X86_64_64
and that it represents the operationS + A
where:S
: the value of the symbol on the object file, here0
because we point to the00 00 00 00 00 00 00 00
ofmovabs $0x0,%rsi
A
: the addend, present in fieldr_added
This address is added to the section on which the relocation operates.
This relocation operation acts on a total 8 bytes.
380 0:
r_addend
= 0So in our example we conclude that the new address will be:
S + A
=.data + 0
, and thus the first thing in the data section.Program header table
Only appears in the executable.
Contains information of how the executable should be put into the process virtual memory.
The executable is generated from object files by the linker. The main jobs that the linker does are:
determine which sections of the object files will go into which segments of the executable.
In Binutils, this comes down to parsing a linker script, and dealing with a bunch of defaults.
You can get the linker script used with
ld --verbose
, and set a custom one withld -T
.do relocation on text sections. This depends on how the multiple sections are put into memory.
readelf -l hello_world.out
gives:On the ELF header,
e_phoff
,e_phnum
ande_phentsize
told us that there are 2 program headers, which start at0x40
and are0x38
bytes long each, so they are:and:
Structure represented http://www.sco.com/developers/gabi/2003-12-17/ch5.pheader.html:
Breakdown of the first one:
p_type
=01 00 00 00
=PT_LOAD
: TODO. I think it means it will be actually loaded into memory. Other types may not necessarily be.p_flags
=05 00 00 00
= execute and read permissions, no write TODOp_offset
= 8x00
TODO: what is this? Looks like offsets from the beginning of segments. But this would mean that some segments are intertwined? It is possible to play with it a bit with:gcc -Wl,-Ttext-segment=0x400030 hello_world.c
p_vaddr
=00 00 40 00 00 00 00 00
: initial virtual memory address to load this segment top_paddr
=00 00 40 00 00 00 00 00
: initial physical address to load in memory. Only matters for systems in which the program can set it's physical address. Otherwise, as in System V like systems, can be anything. NASM seems to just copyp_vaddrr
p_filesz
=d7 00 00 00 00 00 00 00
: TODO vsp_memsz
p_memsz
=d7 00 00 00 00 00 00 00
: TODOp_align
=00 00 20 00 00 00 00 00
: 0 or 1 mean no alignment required TODO what does that mean? otherwise redundant with other fieldsThe second is analogous.
Then the:
section of the
readelf
tells us that:.text
segment. Aha, so this is why it is executable, and not writable.data
segment.