Types of endianness

2019-01-16 21:32发布

问题:

What is the difference between the following types of endianness?

  • byte (8b) invariant big and little endianness
  • half-word (16b) invariant big and little endianness
  • word (32b) invariant big and little endianness
  • double-word (64b) invariant big and little endianness

Are there other types/variations?

回答1:

There are two approaches to endian mapping: address invariance and data invariance.

Address Invariance

In this type of mapping, the address of bytes is always preserved between big and little. This has the side effect of reversing the order of significance (most significant to least significant) of a particular datum (e.g. 2 or 4 byte word) and therefore the interpretation of data. Specifically, in little-endian, the interpretation of data is least-significant to most-significant bytes whilst in big-endian, the interpretation is most-significant to least-significant. In both cases, the set of bytes accessed remains the same.

Example

Address invariance (also known as byte invariance): the byte address is constant but byte significance is reversed.

Addr   Memory
       7    0
       |    |    (LE)   (BE)
       |----|
 +0    | aa |    lsb    msb
       |----|
 +1    | bb |     :      :
       |----|
 +2    | cc |     :      :
       |----|
 +3    | dd |    msb    lsb
       |----|
       |    |

At Addr=0:          Little-endian          Big-endian
Read 1 byte:              0xaa                0xaa   (preserved)
Read 2 bytes:           0xbbaa              0xaabb
Read 4 bytes:       0xddccbbaa          0xaabbccdd

Data Invariance

In this type of mapping, the relative byte significance is preserved for datum of a particular size. There are therefore different types of data invariant endian mappings for different datum sizes. For example, a 32-bit word invariant endian mapping would be used for a datum size of 32. The effect of preserving the value of particular sized datum, is that the byte addresses of bytes within the datum are reversed between big and little endian mappings.

Example

32-bit data invariance (also known as word invariance): The datum is a 32-bit word which always has the value 0xddccbbaa, independent of endianness. However, for accesses smaller than a word, the address of the bytes are reversed between big and little endian mappings.

Addr                Memory

            | +3   +2   +1   +0 |  <- LE
            |-------------------|
+0      msb | dd | cc | bb | aa |  lsb
            |-------------------|
+4      msb | 99 | 88 | 77 | 66 |  lsb
            |-------------------|
     BE ->  | +0   +1   +2   +3 |


At Addr=0:             Little-endian              Big-endian
Read 1 byte:                 0xaa                    0xdd
Read 2 bytes:              0xbbaa                  0xddcc
Read 4 bytes:          0xddccbbaa              0xddccbbaa   (preserved)
Read 8 bytes:  0x99887766ddccbbaa      0x99887766ddccbbaa   (preserved)

Example

16-bit data invariance (also known as half-word invariance): The datum is a 16-bit which always has the value 0xbbaa, independent of endianness. However, for accesses smaller than a half-word, the address of the bytes are reversed between big and little endian mappings.

Addr           Memory

            | +1   +0 |  <- LE
            |---------|
+0      msb | bb | aa |  lsb
            |---------|
+2      msb | dd | cc |  lsb
            |---------|
+4      msb | 77 | 66 |  lsb
            |---------|
+6      msb | 99 | 88 |  lsb
            |---------|
     BE ->  | +0   +1 |


At Addr=0:             Little-endian              Big-endian
Read 1 byte:                 0xaa                    0xbb
Read 2 bytes:              0xbbaa                  0xbbaa   (preserved)
Read 4 bytes:          0xddccbbaa              0xddccbbaa   (preserved)
Read 8 bytes:  0x99887766ddccbbaa      0x99887766ddccbbaa   (preserved)

Example

64-bit data invariance (also known as double-word invariance): The datum is a 64-bit word which always has the value 0x99887766ddccbbaa, independent of endianness. However, for accesses smaller than a double-word, the address of the bytes are reversed between big and little endian mappings.

Addr                         Memory

            | +7   +6   +5   +4   +3   +2   +1   +0 |  <- LE
            |---------------------------------------|
+0      msb | 99 | 88 | 77 | 66 | dd | cc | bb | aa |  lsb
            |---------------------------------------|
     BE ->  | +0   +1   +2   +3   +4   +5   +6   +7 |


At Addr=0:             Little-endian              Big-endian
Read 1 byte:                 0xaa                    0x99
Read 2 bytes:              0xbbaa                  0x9988
Read 4 bytes:          0xddccbbaa              0x99887766
Read 8 bytes:  0x99887766ddccbbaa      0x99887766ddccbbaa   (preserved)


回答2:

There's also middle or mixed endian. See wikipedia for details.

The only time I had to worry about this was when writing some networking code in C. Networking typically uses big-endian IIRC. Most languages either abstract the whole thing or offer libraries to guarantee that you're using the right endian-ness though.



回答3:

Philibert said,

bits were actually inverted

I doubt any architecture would break byte value invariance. The order of bit-fields may need inversion when mapping structs containing them against data. Such direct mapping relies on compiler specifics which are outside the C99 standard but which may still be common. Direct mapping is faster but does not comply with the C99 standard that does not stipulate packing, alignment and byte order. C99-compliant code should use slow mapping based on values rather than addresses. That is, instead of doing this,

#if LITTLE_ENDIAN
  struct breakdown_t {
    int least_significant_bit: 1;
    int middle_bits: 10;
    int most_significant_bits: 21;
  };
#elif BIG_ENDIAN
  struct breakdown_t {
    int most_significant_bits: 21;
    int middle_bits: 10;
    int least_significant_bit: 1;
  };
#else
  #error Huh
#endif

uint32_t data = ...;
struct breakdown_t *b = (struct breakdown_t *)&data;

one should write this (and this is how the compiler would generate code anyways even for the above "direct mapping"),

uint32_t data = ...;
uint32_t least_significant_bit = data & 0x00000001;
uint32_t middle_bits = (data >> 1) & 0x000003FF;
uint32_t most_significant_bits = (data >> 11) & 0x001fffff;

The reason behind the need to invert the order of bit-fields in each endian-neutral, application-specific data storage unit is that compilers pack bit-fields into bytes of growing addresses.

The "order of bits" in each byte does not matter as the only way to extract them is by applying masks of values and by shifting to the the least-significant-bit or most-significant-bit direction. The "order of bits" issue would only become important in imaginary architectures with the notion of bit addresses. I believe all existing architectures hide this notion in hardware and provide only least vs. most significant bit extraction which is the notion based on the endian-neutral byte values.



回答4:

Actually, I'd describe the endianness of a machine as the order of bytes inside of a word, and not the order of bits.

By "bytes" up there I mean the "smallest unit of memory the architecture can manage individually". So, if the smallest unit is 16 bits long (what in x86 would be called a word) then a 32 bit "word" representing the value 0xFFFF0000 could be stored like this:

FFFF 0000

or this:

0000 FFFF

in memory, depending on endianness.

So, if you have 8-bit endianness, it means that every word consisting of 16 bits, will be stored as:

FF 00

or:

00 FF

and so on.



回答5:

Best article I read about endianness "Understanding Big and Little Endian Byte Order".



回答6:

Practically speaking, endianess refers to the way the processor will interpret the content of a given memory location. For example, if we have memory location 0x100 with the following content (hex bytes)


  0x100:  12 34 56 78 90 ab cd ef

Reads    Little Endian            Big Endian
 8-bit:  12                        12
16-bit:  34 12                     12 34
32-bit:  78 56 34 12               12 34 56 78
64-bit:  ef cd ab 90 78 56 34 12   12 34 56 78 90 ab cd ef

The two situations where you need to mind endianess are with networking code and if you do down casting with pointers.

TCP/IP specifies that data on the wire should be big endian. If you transmit types other than byte arrays (like pointers to structures), you should make sure to use the ntoh/hton macros to ensure the data is sent big endian. If you send from a little-endian processor to a big-endian processor (or vice versa), the data will be garbled...

Casting issues:


 uint32_t* lptr = 0x100;
 uint16_t  data;
 *lptr = 0x0000FFFF

 data = *((uint16_t*)lptr);

What will be the value of data? On a big-endian system, it would be 0 On a little-endian system, it would be FFFF



回答7:

13 years ago I worked on a tool portable to both a DEC ALPHA system and a PC. On this DEC ALPHA the bits were actually inverted. That is:

1010 0011

actually translated to

1100 0101

It was almost transparent and seamless in the C code except that I had a bitfield declared like

typedef struct {
   int firstbit:1;
   int middlebits:10;
   int lastbits:21;
};

that needed to be translated to (using #ifdef conditional compiling)

typedef struct {
   int lastbits:21;
   int middlebits:10;
   int firstbit:1;
};


回答8:

the basic concept is the ordering of bits:

1010 0011

in little-endian is the same as

0011 1010

in big-endian (and vice-versa).

You'll notice the order changes by grouping, not by individual bit. I don't know of a system, for example, where

1100 0101

would be the "other-endian" version of the first version.



标签: endianness