struct hack - zero sized array

2019-04-29 01:00发布

问题:

#include <iostream>
using namespace std;

struct node1{
    char b[3];
    int c[0];
};

struct node2{
    int c[0];
};

struct node3{
    char b[3];
};


int main() {

    cout << sizeof(node1) << endl;  // prints 4
    cout << sizeof(node2) << endl;  // prints 0
    cout << sizeof(node3) << endl;  // prints 3
}

My Question is why does the compiler allocate 0 bytes for int c[0] in node2 but allocate 1 byte for its when part of node1. I'm assuming that this 1 byte is the reason why sizeof(node1) returns 4 since without it (like in node3) its size is 3 or is that due to padding??

Also trying to understand that shouldn't node2 have enough space to hold a pointer to an array (which will be allocated in the further down in the code as part of the flexible array/struct hack?

回答1:

Yes, it's about padding/alignment. If you add __attribute__((__packed__)) to the end [useful when writing device drivers], you'll get 3 0 3 for your output.

If node1 had defined c[1], the size is 8 not 7, because the compiler will align c to an int boundary. With packed, sizeof would be 7



回答2:

Yes, padding makes the difference. The reason why node1 has a padding byte, while node3 doesn't, lies in the typical usage of zero-length arrays.

Zero-length arrays are typically used with casting: You cast a larger, (possibly variable-sized) object to the struct containing the zero-length array. Then you access the "rest" of the large object using the zero-length array, which, for this purpose, has to be aligned properly. The padding byte is inserted before the zero-sized array, such that the ints are aligned. Since you can't do that with node3, no padding is needed.

Example:

struct Message {
   char Type[3];
   int Data[];    // it compiles without putting 0 explicitly
};

void ReceiveMessage(unsigned char* buffer, size_t length) {
    if(length < sizeof(Message))
        return;
    Message* msg = (Message*)buffer;
    if(!memcmp(msg->Type, "GET", 3)) {
        HandleGet(msg->Data, (length - sizeof(Message))/sizeof(int));
    } else if....

Note: this is rather hackish, but efficient.



回答3:

c doesn't allocate one byte in node1. Its because of the padding added to b.

For b, to be easily obtainable by a 32-bit CPU, it is four bytes big. 32-bit CPUs can read 4 consecutive bytes from memory at a time. To read three, they have to read four and then remove the one not necessary. Therefore, to optimize this behavior, the compiler padds the struct with some bytes.

You can observe similar compiler optimizations when values are pushed on the stack (that is, arguments or local variables are allocated). The stack is always kept aligned to the CPU's data bus size (commonly 32 or 64 bits).



回答4:

int main() {

  cout << sizeof(node1) << endl;  // prints 4
  cout << sizeof(node2) << endl;  // prints 0
  cout << sizeof(node3) << endl;  // prints 3
}

the main function queries the the size of the user defined structs, not of the array members. sizeof() will return the number of bytes allocated to the struct, with each character allocated in the character array being allocated 1 byte. A character array is really a C style string which is terminated by the sentinel character '\0'. It is likely to include the byte allocated to hold the sentinel character when evaluating the sizeof(node1) as there is another variable after it so it reads over it, but not include the sentinel in sizeof(node3) where the string and the struct terminates