When does Endianness become a factor?

2019-01-21 02:21发布

Endianness from what I understand, is when the bytes that compose a multibyte word differ in their order, at least in the most typical case. So that an 16-bit integer may be stored as either 0xHHLL or 0xLLHH.

Assuming I don't have that wrong, what I would like to know is when does Endianness become a major factor when sending information between two computers where the Endian may or may not be different.

  • If I transmit a short integer of 1, in the form of a char array and with no correction, is it received and interpretted as 256?

  • If I decompose and recompose the short integer using the following code, will endianness no longer be a factor?

    // Sender:
    for(n=0, n < sizeof(uint16)*8; ++n) {
        stl_bitset[n] = (value >> n) & 1;
    };
    
    // Receiver:
    for(n=0, n < sizeof(uint16)*8; ++n) {
        value |= uint16(stl_bitset[n] & 1) << n;
    };
    
  • Is there a standard way of compensating for endianness?

Thanks in advance!

8条回答
\"骚年 ilove
2楼-- · 2019-01-21 02:53

Endianness is ALWAYS an issue. Some will say that if you know that every host connected to the network runs the same OS, etc, then you will not have problems. This is true until it isn't. You always need to publish a spec that details the EXACT format of on-wire data. It can be any format you want, but every endpoint needs to understand the format and be able to interpret it correctly.

In general, protocols use big-endian for numerical values, but this has limitations if everyone isn't IEEE 754 compatible, etc. If you can take the overhead, then use an XDR (or your favorite solution) and be safe.

查看更多
We Are One
3楼-- · 2019-01-21 02:53

Here are some guidelines for C/C++ endian-neutral code. Obviously these are written as "rules to avoid"... so if code has these "features" it could be prone to endian-related bugs !! (this is from my article on Endianness published in Dr Dobbs)

  1. Avoid using unions which combine different multi-byte datatypes. (the layout of the unions may have different endian-related orders)

  2. Avoid accessing byte arrays outside of the byte datatype. (the order of the byte array has an endian-related order)

  3. Avoid using bit-fields and byte-masks (since the layout of the storage is dependent upon endianness, the masking of the bytes and selection of the bit fields is endian sensitive)

  4. Avoid casting pointers from multi-byte type to other byte types.
    (when a pointer is cast from one type to another, the endianness of the source (ie. The original target) is lost and subsequent processing may be incorrect)

查看更多
爷、活的狠高调
4楼-- · 2019-01-21 02:54

For the record, if you're transferring data between devices you should pretty much always use network-byte-ordering with ntohl, htonl, ntohs, htons. It'll convert to the network byte order standard for Endianness regardless of what your system and the destination system use. Of course, both systems shoud be programmed like this - but they usually are in networking scenarios.

查看更多
倾城 Initia
5楼-- · 2019-01-21 02:59

Very abstractly speaking, endianness is a property of the reinterpretation of a variable as a char-array.

Practically, this matters precisely when you read() from and write() to an external byte stream (like a file or a socket). Or, speaking abstractly again, endianness matters when you serialize data (essentially because serialized data has no type system and just consists of dumb bytes); and endianness does not matter within your programming language, because the language only operates on values, not on representations. Going from one to the other is where you need to dig into the details.

To wit - writing:

uint32_t n = get_number();

unsigned char bytesLE[4] = { n, n >> 8, n >> 16, n >> 24 };  // little-endian order
unsigned char bytesBE[4] = { n >> 24, n >> 16, n >> 8, n };  // big-endian order

write(bytes..., 4);

Here we could just have said, reinterpret_cast<unsigned char *>(&n), and the result would have depended on the endianness of the system.

And reading:

unsigned char buf[4] = read_data();

uint32_t n_LE = buf[0] + buf[1] << 8 + buf[2] << 16 + buf[3] << 24; // little-endian
uint32_t n_BE = buf[3] + buf[2] << 8 + buf[1] << 16 + buf[0] << 24; // big-endian

Again, here we could have said, uint32_t n = *reinterpret_cast<uint32_t*>(buf), and the result would have depended on the machine endianness.


As you can see, with integral types you never have to know the endianness of your own system, only of the data stream, if you use algebraic input and output operations. With other data types such as double, the issue is more complicated.

查看更多
啃猪蹄的小仙女
6楼-- · 2019-01-21 02:59

The "standard way" of compensating is that the concept of "network byte order" has been defined, almost always (AFAIK) as big endian.

Senders and receivers both know the wire protocol, and if necessary will convert before transmitting and after receiving, to give applications the right data. But this translation happens inside your networking layer, not in your applications.

查看更多
小情绪 Triste *
7楼-- · 2019-01-21 03:04

You shouldn't have to worry, unless you're at the border of the system. Normally, if you're talking in terms of the stl, you already passed that border.

It's the task of the serialization protocol to indicate/determine how a series of bytes can be transformed into the type you're sending, beit a built-in type or a custom type.

If you're talking built-in only, you may suffice with the machine-abstraction provided by tools provided by your environment]

查看更多
登录 后发表回答