Reverse the Endianness of a C structure

2019-03-22 08:03发布

I have a structure in C that looks like this:

typedef u_int8_t NN;
typedef u_int8_t X;
typedef int16_t S;
typedef u_int16_t U;
typedef char C;

typedef struct{
 X test;
 NN test2[2];
 C test3[4];
 U test4;
} Test;

I have declared the structure and written values to the fields as follows:

Test t;
int t_buflen = sizeof(t);
memset( &t, 0, t_buflen);
t.test = 0xde;
t.test2[0]=0xad; t.test2[1]=0x00;
t.test3[0]=0xbe; t.test3[1]=0xef; t.test3[2]=0x00; t.test3[3]=0xde;
t.test4=0xdeca; 

I am sending this structure via UDP to a server. At present this works fine when I test locally, however I now need to send this structure from my little-endian machine to a big-endian machine. I'm not really sure how to do this.

I've looked into using htons but I'm not sure if that's applicable in this situation as it seem to only be defined for unsigned ints of 16 or 32 bits, if I understood correctly.

标签: c endianness
3条回答
疯言疯语
2楼-- · 2019-03-22 08:42

There's no endianness of the structure really. It's all the separate fields that need to be converted to big-endian when needed. You can either make a copy of the structure and rewrite each field using hton/htons, then send the result. 8-bit fields don't need any modification of course.

In case of TCP you could also just send each part separately and count on nagle algorithm to merge all parts into a single packet, but with UDP you need to prepare everything up front.

查看更多
聊天终结者
3楼-- · 2019-03-22 08:47

I think there may be two issues here depending on how you're sending this data over TCP.

Issue 1: Endianness

As, you've said endianness is an issue. You're right when you mention using htons and ntohs for shorts. You may also find htonl and its opposite useful too.

Endianness has to do with the byte ordering of multiple-byte data types in memory. Therefore, for single byte-width data types you do not have to worry. In your case is is the 2-byte data that I guess you're questioning.

To use these functions you will need to do something like the following...

Sender:
-------
t.test     = 0xde; // Does not need to be swapped
t.test2[0] = 0xad; ... // Does not need to be swapped
t.test3[0] = 0xbe; ... // Does not need to be swapped
t.test4    = htons(0xdeca); // Needs to be swapped 

...

sendto(..., &t, ...);


Receiver:
---------
recvfrom(..., &t, ...);
t.test4    = ntohs(0xdeca); // Needs to be swapped 

Using htons() and ntohs() use the Ethernet byte ordering... big endian. Therefore your little-endian machine byte swaps t.test4 and on receipt the big-endian machine just uses that value read (ntohs() is a noop effectively).

The following diagram will make this more clear... Endian swapping

If you did not want to use the htons() function and its variants then you could just define the buffer format at the byte level. This diagram make's this more clear... Define byte format of buffer

In this case your code might look something like

Sender:
-------
uint8_t buffer[SOME SIZE];
t.test     = 0xde;
t.test2[0] = 0xad; ... 
t.test3[0] = 0xbe; ... 
t.test4    = 0xdeca;

buffer[0] = t.test;
buffer[1] = t.test2[0];
/// and so on, until...
buffer[7] = t.test4 & 0xff;
buffer[8] = (t.test4 >> 8) & 0xff;    

...

sendto(..., buffer, ...);

Receiver:
---------
uint8_t buffer[SOME SIZE];
recvfrom(..., buffer, ...);

t.test     = buffer[0];
t.test2[0] = buffer[1];
// and so on, until...
t.test4    = buffer[7] | (buffer[8] << 8);

The send and receive code will work regardless of the respective endianness of the sender and receiver because the byte-layout of the buffer is defined and known by the program running on both machines.

However, if you're sending your structure through the socket in this way you should also note the caveat below...

Issue 2: Data alignment

The article "Data alignment: Straighten up and fly right" is a great read for this one...

The other problem you might have is data alignment. This is not always the case, even between machines that use different endian conventions, but is nevertheless something to watch out for...

struct
{
    uint8_t  v1;
    uint16_t v2; 
}

In the above bit of code the offset of v2 from the start of the structure could be 1 byte, 2 bytes, 4 bytes (or just about anything). The compiler cannot re-order members in your structure, but it can pad the distance between variables.

Lets say machine 1 has a 16-bit wide data bus. If we took the structure without padding the machine will have to do two fetches to get v2. Why? Because we access 2 bytes of memory at a time at the h/w level. Therefore the compiler could pad out the structure like so

struct
{
    uint8_t  v1;
    uint8_t  invisible_padding_created_by_compiler;
    uint16_t v2; 
}

If the sender and receiver differ on how they pack data into a structure then just sending the structure as a binary blob will cause you problems. In this case you may have to pack the variables into a byte stream/buffer manually before sending. This is often the safest way.

查看更多
虎瘦雄心在
4楼-- · 2019-03-22 08:53

The data you are sending over the network should be the same regardless of the endianess of the machines involved. The key word you need to research is serialization. This means converting a data structure to a series of bits/bytes to be sent over a network or saved to disk, which will always be the same regardless of anything like architecture or compiler.

查看更多
登录 后发表回答