I need help understanding endianness inside CPU registers of x86 processors. I wrote this small assembly program:
section .data
section .bss
section .text
global _start
_start:
nop
mov eax, 0x78FF5ABC
mov ebx,'WXYZ'
nop ; GDB breakpoint here.
mov eax, 1
mov ebx, 0
int 0x80
I ran this program in GDB with a breakpoint on line number 10 (commented in the source above). At this breakpoint, info registers
shows the value of eax=0x78ff5abc
and ebx=0x5a595857
.
Since the ASCII codes for W, X, Y, Z are 57, 58, 59, 5A respectively; and intel is little endian, 0x5a595857 seems like the correct byte order (least significant byte first). Why isn't then the output for eax register 0xbc5aff78
(least significant byte of the number 0x78ff5abc first) instead of 0x78ff5abc
?
In simple words, treat registers as just values, endiannes on how they are finally stored is not important.
You know that writing on eax you write a 32 bit number, and you know that reading from eax you will read the same 32 bit number. In this terms, endianness doesn't matter.
Than you know that in "al" you have less significant 8-bit part of the value, in "ah" most significan 8-bit part of the lower 16 bits. There is no way to access single bytes on higher 16bits, except of course reading the whole 32 bit value.
The assembler is handling the two constants differently. Internally, a value in the EAX register is stored in big-endian format. You can see that by writing:
If you inspect the register, you'll see that its value is
0x00000001
.When you tell the assembler that you want the constant value
0x78ff5abc
, that's exactly what gets stored in the register. The high 8 bits of EAX will contain0x78
, and the AL register contains0xbc
.Now if you were to store the value from EAX into memory, it would be laid out in memory in the reverse order. That is, if you were to write:
And then inspected memory at [addr], you would see 0xbc, 0x5a, 0xff, 0x78.
In the case of 'WXYZ', the assembler assumes that you want to load the value such that if you were to write it to memory, it would be laid out as 0x57, 0x58, 0x59, 0x5a.
Take a look at the code bytes that the assembler generates and you'll see the difference. In the case of
mov eax,0x78ff5abc
, you'll see:In the case of
mov eax,WXYZ
, you'll see:Endianness inside a register makes no sense since endianness describes if the byte order is from low to high memory address or from high to low memory address. Registers are not byte addressable so there is no low or high address within a register. What you are seeing is how your debugger print out the data.
Endianness makes sense only for memory, where each byte have a numeric address. When MSByte of a value is put in higher memory address than the LSByte, it's called Littte endian, and this is the endianness of any x86 processor.
While for integers the distinction between LSByte and MSByte is clear:
It's not defined for string literals! It's not obvious what part of the
WXYZ
should be considered LSB or MSB:1) The most obvious way,
would lead to memory order
ZYXW
.2) The not not so obvious way, when the memory order should match the order of literals:
The assembler have to choose one of them, and apparently it chooses the second.