Difference in ARM and x86 assembly code generated

2019-08-10 22:16发布

问题:

Let us take a simple C code for setting a register:

int main()
{
    int *a = (int*)111111;
    *a = 0x1000;
    return 0;
}

When I compile this code for ARM (arm-none-eabi-gcc) with level 1 optimization, the assembly code is something like:

mov     r2, #4096
mov     r3, #110592
str     r2, [r3, #519]
mov     r0, #0
bx      lr

Looks like the address 111111 was resolved to the closest 4K boundary (110592) and moved to r3, and then the value 4096(0x1000) was stored by adding 519 to 110592 (=111111). Why does this happen?

In x86, the assembly is straightforward:

movl    $4096, 111111
movl    $0, %eax
ret

回答1:

The reason behind this encoding, is because x86 has variable sized instructions -- from 1 byte up to 16 bytes (and possibly even more with prefixes).

ARM instruction is 32 bits wide (not counting Thumb modes), which means that it's simply not possible to encode all 32-bit wide constants (immediates) in a single opcode.

Fixed sized architectures typically use a few methods to load large constants:

1)  movi  #r1, Imm8  ; // Here Imm8 or ImmX is simply X least significant bits
2)  movhi #r1, Imm16 ; // Here Imm16 loads the 16 MSB of the register
3)  load  #r1, (PC + ImmX);  // use PC-relative address to put constant in code
4)  movn  #r1, Imm8 ;  // load the inverse of Imm8 (for signed constants) 
5)  mov(i/n) #1, Imm8 << N;       // where N=0,8,16,24

Variable sized architectures OTOH can put all the constants in a single instruction:

xx xx xx 00 10 00 00 11 11 11 00 ; // assuming that it takes 3 bytes to encode
                                 ; // the instruction and the addressing mode
; added with 4 bytes to encode the 4096 and 4 bytes to encode 0x00111111


回答2:

The address had to be split in two parts because this specific constant cannot be loaded into a register with a single instruction.

The ARM documentation specifies limitations for the immediate constants allowed in some instructions (such as MOV):

In ARM instructions, constant can have any value that can be produced by rotating an 8-bit value right by any even number of bits within a 32-bit word.

In 32-bit Thumb-2 instructions, constant can be:

Any constant that can be produced by shifting an 8-bit value left by any number of bits within a 32-bit word.

Any constant of the form 0x00XY00XY.
Any constant of the form 0xXY00XY00.
Any constant of the form 0xXYXYXYXY.

The value 111111 (1B207 in hex) can't be represented as any of the above, so the compiler had to split it.

110592 is 1B000 so it fulfills the first condition (an 8-bit value 0x1B rotated left by 12 bits) and can be loaded using MOV instruction.

The STR instruction, on the other hand, has a different set of limitations for the offsets used. In particular, 519 (0x207) falls into the -4095 to 4095 range allowed for the word store/load in ARM mode.


In this specific case the compiler managed to split the constant in only two parts. If your immediate has more bits, it may have to generate even more instructions, or use a literal pool load. For example, if I use 0xABCDEF78, I get this (for ARMv7):

movw    r3, #61439
movt    r3, 43981
mov     r2, #4096
str     r2, [r3, #-135]
mov     r0, #0
bx      lr

For architectures without MOVW/MOVT (e.g. ARMv4), GCC seems to fall back to literal pool:

    mov     r2, #4096
    ldr     r3, .L2
    str     r2, [r3, #-135]
    mov     r0, #0
    bx      lr
.L3:
    .align  2
.L2:
    .word   -1412567041


回答3:

The compiler is probably taking advantage of ARM immediate value encoding to reduce code size. Basically 110592 is 0x1B << 12 and this enables some simplifications. Take a look at the output from arm-none-eabi-objdump -d of your program to check the length of each instruction.