Simple bitwise manipulation for little-endian inte

2019-02-18 07:21发布

问题:

For a specific need I am building a four byte integer out of four one byte chars, using nothing too special (on my little endian platform):

    return (( v1 << 24) | (v2 << 16) | (v3 << 8) | v4);

I am aware that an integer stored in a big endian machine would look like AB BC CD DE instead of DE CD BC AB of little endianness, although would it affect the my operation completely in that I will be shifting incorrectly, or will it just cause a correct result that is stored in reverse and needs to be reversed?

I was wondering whether to create a second version of this function to do (yet unknown) bit manipulation for a big-endian machine, or possibly to use ntonl related function which I am unclear of how that would know if my number is in correct order or not.

What would be your suggestion to ensure compatibility, keeping in mind I do need to form integers in this manner?

回答1:

As long as you are working at the value level, there will be absolutely no difference in the results you obtain regardless of whether your machine is little-endian or big-endian. I.e. as long as you are using language-level operators (like | and << in your example), you will get exactly the same arithmetical result from the above expression on any platform. The endianness of the machine is not detectable and not visible at this level.

The only situations when you need to care about endianness is when the data you are working with is examined at the object representation level, i.e. in situations when its raw memory representation is important. What you said above about "AB BC CD DE instead of DE CD BC AB" is specifically about the raw memory layout of the data. That's what functions like ntonl do: they convert one memory layout to another memory layout. So far you gave no indication that the actual raw memory layout is in any way important to you. Is it?

Again, if you only care about the value of the above expression, it is fully and totally endianness-independent. Basically, you are not supposed to care about endianness at all when you write C programs that don't attempt to access and examine the raw memory contents.



回答2:

although would it affect the my operation completely in that I will be shifting incorrectly (?)

No.

The result will be the same regardless of the endian architecture. Bit shifting and twiddling are just like regular arithmetic operations. Is 2 + 2 the same on little endian and big endian architectures? Of course. 2 << 2 would be the same as well.

Little and big endian problems arise when you are dealing directly with the memory. You will run into problems when you do the following:

char bytes[] = {1, 0, 0, 0};
int n = *(int*)bytes;

On little endian machines, n will equal 0x00000001. On big endian machines, n will equal 0x01000000. This is when you will have to swap the bytes around.



回答3:

[Rewritten for clarity]

ntohl (and ntohs, etc.) is used primarily for moving data from one machine to another. If you're simply manipulating data on one machine, then it's perfectly fine to do bit-shifting without any further ceremony -- bit-shifting (at least in C and C++) is defined in terms of multiplying/dividing by powers of 2, so it works the same whether the machine is big-endian or little-endian.

When/if you need to (at least potentially) move data from one machine to another, it's typically sensible to use htonl before you send it, and ntohl when you receive it. This may be entirely nops (in the case of BE to BE), two identical transformations that cancel each other out (LE to LE), or actually result in swapping bytes around (LE to BE or vice versa).



回答4:

FWIW, I think a lot of what has been said here is correct. However, if the programmer has coded with endianness in mind, say using masks for bitwise inspection and manipulation, then cross-platform results could be unexpected.

You can determine 'endianness' at runtime as follows:

#define LITTLE_ENDIAN 0
#define BIG_ENDIAN    1

int endian() {
    int i = 1;
    char *p = (char *)&i;

    if (p[0] == 1)
        return LITTLE_ENDIAN;
    else
        return BIG_ENDIAN;
}

... and proceed accordingly.

I borrowed the code snippet from here: http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs- where there is also an excellent discussion of these issues.

hth -

Perry