I noticed a GNU asm relocation syntax for ARM 64-bit assembly. What are those pieces like #:abs_g0_nc:
and :pg_hi21:
? Where are they explained? Is there a pattern to them or are they made up on the go? Where can I learn more?
相关问题
- Null-terminated string, opening file for reading
- What's the difference between 0 and dword 0?
- Translate the following machine language code (0x2
- Where can the code be more efficient for checking
- How can I include a ASM program into my Turbo Basi
相关文章
- How to generate assembly code with gcc that can be
- Select unique/deduplication in SSE/AVX
- Optimising this C (AVR) code
- Why does the latency of the sqrtsd instruction cha
- Difference in ABI between x86_64 Linux functions a
- x86 instruction encoding tables
- Why doesn't there exists a subi opcode for MIP
- Tool to Debug Guest OS in Virtual Box
Introduction
ELF64 defines two types of relocation entries, called REL and RELA:
The scope of each relocation entry is to give the loader (static or dynamic) four pieces of information:
The virtual address or the offset of the instruction to patch.
This is given by
r_offset
.The runtime address of the symbol accessed.
This is given by the higher part of
r_info
.A custom value called addend
This value, eventually, as an operand in the expression used to calculate the value that will be written to patch the instruction.
RELA entries have this value in
r_addend
, REL entries extract it from the relocation site.The relocation type This determines the type of expression uses to calculate the value to patch the instruction. This is encoded in the lower part of
r_info
.Relocating
During the relocation phase the loader goes through all the relocation entries and write to the location specified by each
r_offset
, using a formula chosen by the lower part ofr_info
to compute the value to be stored from the addend (r_addend
for RELA) and the symbol address (obtainable from the upper part ofr_info
).Actually the write part has been simplified, contrary to other architecture where the immediate field of an instruction usually occupy entirely separate byes from the ones used to encode the operation, in ARM, the immediate value is mixed with other encoding information.
So the loader should know what kind of instruction is trying to relocate, if it is an instruction at all1, but instead of letting it disassemble the site of relocation, it is the assembler that set the relocation type according to the instruction.
Each relocation symbol can relocate only one or two, encoding-equivalent, instructions.
In specific case the relocation itself even change the type of instruction.
The value compute computed during the relocation is implicitly extended to 64 bits, signed or unsigned based on the relocation type chosen.
AArch64 relocation
Being ARM a RISC architecture with fixed instruction size, loading full width, i.e. 64 bits, immediate into a register is non trivial as no instruction can have a full width immediate field.
Relocation in AArch64 has to address this issue too, it is actually a two fold problem: first, find the real value that the programmer intended to use (this is the pure relocation part of the problem); second, find a way to put it into a register, since no instruction has a 64 bits immediate field.
The second issue is addressed by using group relocation, each relocation type in a group is used to compute a 16 bits part of the 64 bits value, therefore there can only be four relocation type in a group (ranging from G0 to G3).
This slicing into 16 bits comes to fit with the
movk
(move keeping),movz
(move zeroing) andmovn
(move negating logically).Other instructions, like
b
,bl
,adrp
,adr
and so on, have a relocation type specially suited for them.Whenever there is only one, thus unambiguous, possible relocation type for a given instruction that reference a symbol, the assembler can generate the corresponding entry without the need, for the programmer, to specify it explicitly.
Group relocation doesn't fit into this category, they exist to allow the programmer some flexibility, thus are generally explicitly stated. In a group, a relocation type can specify if the assembler must perform an overflow check or not.
A G0 relocation, used to load the lower 16 bits of a value, unless explicitly suppressed, check that the value can fit 16 bits (signed or unsigned, depending on the specific type used). The same is true for G1, that loading bits 31-16 check that the values can fits 32 bits.
As a consequence G3 is always non checking as every value fits 64 bits.
Finally, relocation can be used to load integer values into register. In fact, an address of a symbol is nothing more than an arbitrary integer constant.
Note that
r_addend
is 64 bits wide.1 If
r_offset
points to a site in a data section the computed value is written as 64 bits word at the location indicated.Relocation operators
First of all, some references:
The ARM document that describes the relocation types for the ELF64 format is here, section 4.6
A test AArch64 assembly file that, presumably, contains all the relocation operators available to GAS is
herehereConventions
Following the ARM document convention we have:
Operators
The relocation name is missing the prefix
R_AARCH64_
for the sake of compactness.Expressions of the kind |X|≤2^16 are intended as -2^16 ≤ X < 2^16, note the strict inequality on the right.
This is an abuse of notation, called by the constrains of formatting a table.
Group relocations
In the table the ABS version is showed, the assembler can pickup the PREL (PC relative) or the GOTOFF (GOT relative) version depending on the symbol referenced and the type of output format.
A typical use of this relocation operators is
Usually one one checking operator is used, the one that set the highest part.
That's why checking version relocates
movz
only, while the non checking version relocatesmovk
(which partially set a register).G3 relocated both because it is intrinsically non checking as no value can exceed 64 bits.
The signed versions ends with
_s
and they are always checking.There is no G3 version because if a 64 bits value is used the sign if sully specified in the value itself.
They are always used only to set the highest part, as the sign is relevant only there.
They are always checking as an overflow in a signed value make the value meaning less.
These relocations change the type of the instruction to
movn
ormovz
based on the sign of the value, this effectively sign extend the value.Group relocations, are also available
PC-relative, 19, 21, 33 bits addresses
The
:lo12:
change meaning depending on the size of the data the instruction is handling (e.g.ldrb
usesLDST8_ABS_LO12_NC
,ldrh
usesLDST16_ABS_LO12_NC
).A GOT relative version of these relocations also exists, the assembler will pickup the right one.
Control flow relocations
Epilogue
I couldn't find an official documentation.
The tables above have been reconstructed from the GAS test case and the ARM document explaining the type of relocations available for AArch64 compliant ELFs.
The tables doesn't show all the relocations present in the ARM document, as most of them are complementary versions, picked up by the assembler automatically.
A section with examples would be great, but I don't have an ARM GAS.
In the future I may extend this answer to include examples of assembly listings and relocations dumps.