Is struct packing deterministic?

2020-02-08 03:06发布

For example, say I have two equivalent structs a and b in different projects:

typedef struct _a
{
    int a;
    double b;
    char c;
} a;

typedef struct _b
{
    int d;
    double e;
    char f;
} b;

Assuming I haven't used any directives like #pragma pack and these structs are compiled on the same compiler on the same architecture, will they have identical padding between variables?

标签: c padding
8条回答
Juvenile、少年°
2楼-- · 2020-02-08 03:28

The C standard itself says nothing about it, so in line of principle you just cannot be sure.

But: most probably your compiler adheres to some particular ABI, otherwise communicating with other libraries and with the operating system would be a nightmare. In this last case, the ABI will usually prescribe exactly how packing works.

For example:

  • on x86_64 Linux/BSD, the SystemV AMD64 ABI is the reference. Here (§3.1) for every primitive processor data type it is detailed the correspondence with the C type, its size and its alignment requirement, and it's explained how to use this data to make up the memory layout of bitfields, structs and unions; everything (besides the actual content of the padding) is specified and deterministic. The same holds for many other architectures, see these links.

  • ARM recommends its EABI for its processors, and it's generally followed by both Linux and Windows; the aggregates alignment is specified in "Procedure Call Standard for the ARM Architecture Documentation", §4.3.

  • on Windows there's no cross-vendor standard, but VC++ essentially dictates the ABI, to which virtually any compiler adhere; it can be found here for x86_64, here for ARM (but for the part of interest of this question it just refers to the ARM EABI).

查看更多
三岁会撩人
3楼-- · 2020-02-08 03:29

Any particular compiler ought to be deterministic, but between any two compilers, or even the same compiler with different compilation options, or even between different versions of the same compiler, all bets are off.

You're much better off if you don't depend on the details of the structure, or if you do, you should embed code to check at runtime that the structure is actually as you depend.

A good example of this is the recent change from 32 to 64 bit architectures, where even if you didn't change the size of integers used in a structure, the default packing of partial integers changed; where previously 3 32bit integers in a row would pack perfectly, now they pack into two 64 bit slots.

You can't possibly anticipate what changes may occur in the future; if you depend on details that are not guaranteed by the language, such as structure packing, you ought to verify your assumptions at runtime.

查看更多
▲ chillily
4楼-- · 2020-02-08 03:31

You cannot approach deterministically the layout of a structure or union in C language on different systems.

While many times it could seem that the layout generated by different compilers is the same, you must consider the cases a convergence dictated by practical and functional convenience of compiler design in the ambit of choice freedom left to the programmer by the standard, and thus not effective.

The C11 standard ISO/IEC 9899:2011, almost unchanged from previous standards, clearly stated in paragraph 6.7.2.1 Structure and union specifiers:

Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.

Even worst the case of bitfields where a large autonomy is left to the programmer:

An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

Just count how many times the terms 'implementation-defined' and 'unspecified' appear in the text.

Agreed that to check compiler version, machine and target architecture each run before to use structure or union generated on a different system is unaffordable you should have got a decent answer to your question.

Now let's say that yes, there is a way-around.

Be clear that it is not definitely the solution, but is a common approach that you can found around when data structures exchange is shared between different systems: pack structure elements on value 1 (standard char size).

The use of packing and an accurate structure definition can lead to a sufficiently reliable declaration that can be used on different systems. The packing forces the compiler to remove implementation defined alignments, reducing the eventual incompatibilities due to standard. Moreover avoiding to use bitfields you can remove residual implementation dependent inconsistencies. Last, the access efficiency, due to missing alignment can be recreated by manually adding some dummy declaration inbetween elements, crafted in such a way to force back each field on correct alignment.

As a residual case you have to consider a padding at structure end that some compilers add, but because there is no useful data associated you can ignore it (unless for dynamic space allocation, but again you can deal with it).

查看更多
对你真心纯属浪费
5楼-- · 2020-02-08 03:31

Yes. You should always assume deterministic behaviour from your compiler.

[EDIT] From the comments below, it is obvious there are many Java programmers reading the question above. Let's be clear: C structs do not generate any name, hash, or the likes in object files, libraries, or dlls. The C function signatures do not refer to them either. Which means, the member names can be changed at whim - really! - provided the type and order of the member variables is the same. In C, the two structures in the example are equivalent, since packing does not change. which means that the following abuse is perfectly valid in C, and there's certainly much worse abuse to be found in some of the most widely-used libraries.

[EDIT2] No one should ever dare to do any of the following in C++

/* the 3 structures below are 100% binary compatible */
typedef struct _a { int a; double b; char c; }
typedef struct _b { int d; double e; char f; }
typedef struct SOME_STRUCT { int my_i; double my_f; char my_c[1]; }

struct _a a = { 1, 2.5, 'z' };
struct _b b;

/* the following is valid, copy b -> a  */
*(SOME_STRUCT*)&a = *(SOME_STRUCT*)b;
assert((SOME_STRUCT*)&a)->my_c[0] == b.f);
assert(a.c == b.f);

/* more generally these identities are always true. */
assert(sizeof(a) == sizeof(b));
assert(memcmp(&a, &b, sizeof(a)) == 0);
assert(pure_function_requiring_a(&a) == pure_function_requiring_a((_a*)&b));
assert(pure_function_requiring_b((b*)&a) == pure_function_requiring_b(&b));

function_requiring_a_SOME_STRUCT_pointer(&a);  /* may generate a warning, but not all compiler will */
/* etc... the name space abuse is limited to the programmer's imagination */
查看更多
SAY GOODBYE
6楼-- · 2020-02-08 03:35

Any sane compiler will produce identical memory layout for the two structs. Compilers are usually written as perfectly deterministic programs. Non-determinism would need to be added explicitly and deliberately, and I for one fail to see the benefit of doing so.

However, that does not allow you to cast a struct _a* to a struct _b* and access its data via both. Afaik, this would still be a violation of strict aliasing rules even if the memory layout is identical, as it would allow the compiler to reorder accesses via the struct _a* with accesses via the struct _b*, which would result in unpredictable, undefined behavior.

查看更多
淡お忘
7楼-- · 2020-02-08 03:36

will they have identical padding between variables?

In practice, they mostly like to have the same memory layout.

In theory, since the standard doesn't say much on how padding should be employed on objects, you can't really assume anything on the padding between the elements.

Also, I can't see even why would you want to know/assume something about the padding between the members of a struct. simply write standard, compliant C code and you'll be fine.

查看更多
登录 后发表回答