Endianness inside CPU registers

2019-02-03 19:56发布

问题:

I need help understanding endianness inside CPU registers of x86 processors. I wrote this small assembly program:

section .data
section .bss

section .text
    global _start
_start:
    nop
    mov eax, 0x78FF5ABC
    mov ebx,'WXYZ'
    nop  ; GDB breakpoint here.
    mov eax, 1
    mov ebx, 0
    int 0x80

I ran this program in GDB with a breakpoint on line number 10 (commented in the source above). At this breakpoint, info registers shows the value of eax=0x78ff5abc and ebx=0x5a595857.

Since the ASCII codes for W, X, Y, Z are 57, 58, 59, 5A respectively; and intel is little endian, 0x5a595857 seems like the correct byte order (least significant byte first). Why isn't then the output for eax register 0xbc5aff78 (least significant byte of the number 0x78ff5abc first) instead of 0x78ff5abc?

回答1:

Endianness makes sense only for memory, where each byte have a numeric address. When MSByte of a value is put in higher memory address than the LSByte, it's called Littte endian, and this is the endianness of any x86 processor.

While for integers the distinction between LSByte and MSByte is clear:

    0x12345678
MSB---^^    ^^---LSB

It's not defined for string literals! It's not obvious what part of the WXYZ should be considered LSB or MSB:

1) The most obvious way,

'WXYZ' ->  0x5758595A

would lead to memory order ZYXW.

2) The not not so obvious way, when the memory order should match the order of literals:

'WXYZ' ->  0x5A595857

The assembler have to choose one of them, and apparently it chooses the second.



回答2:

Endianness inside a register makes no sense since endianness describes if the byte order is from low to high memory address or from high to low memory address. Registers are not byte addressable so there is no low or high address within a register. What you are seeing is how your debugger print out the data.



回答3:

The assembler is handling the two constants differently. Internally, a value in the EAX register is stored in big-endian format. You can see that by writing:

mov eax, 1

If you inspect the register, you'll see that its value is 0x00000001.

When you tell the assembler that you want the constant value 0x78ff5abc, that's exactly what gets stored in the register. The high 8 bits of EAX will contain 0x78, and the AL register contains 0xbc.

Now if you were to store the value from EAX into memory, it would be laid out in memory in the reverse order. That is, if you were to write:

mov [addr],eax

And then inspected memory at [addr], you would see 0xbc, 0x5a, 0xff, 0x78.

In the case of 'WXYZ', the assembler assumes that you want to load the value such that if you were to write it to memory, it would be laid out as 0x57, 0x58, 0x59, 0x5a.

Take a look at the code bytes that the assembler generates and you'll see the difference. In the case of mov eax,0x78ff5abc, you'll see:

<opcodes for mov eax>, 0xbc, 0x5a, 0xff, 0x78

In the case of mov eax,WXYZ, you'll see:

<opcodes for mov eax>, 0x57, 0x58, 0x59, 0x5a


回答4:

In simple words, treat registers as just values, endiannes on how they are finally stored is not important.

You know that writing on eax you write a 32 bit number, and you know that reading from eax you will read the same 32 bit number. In this terms, endianness doesn't matter.

Than you know that in "al" you have less significant 8-bit part of the value, in "ah" most significan 8-bit part of the lower 16 bits. There is no way to access single bytes on higher 16bits, except of course reading the whole 32 bit value.