For example, say I have two equivalent structs a
and b
in different projects:
typedef struct _a
{
int a;
double b;
char c;
} a;
typedef struct _b
{
int d;
double e;
char f;
} b;
Assuming I haven't used any directives like #pragma pack
and these structs are compiled on the same compiler on the same architecture, will they have identical padding between variables?
The C standard itself says nothing about it, so in line of principle you just cannot be sure.
But: most probably your compiler adheres to some particular ABI, otherwise communicating with other libraries and with the operating system would be a nightmare. In this last case, the ABI will usually prescribe exactly how packing works.
For example:
on x86_64 Linux/BSD, the SystemV AMD64 ABI is the reference. Here (§3.1) for every primitive processor data type it is detailed the correspondence with the C type, its size and its alignment requirement, and it's explained how to use this data to make up the memory layout of bitfields, structs and unions; everything (besides the actual content of the padding) is specified and deterministic. The same holds for many other architectures, see these links.
ARM recommends its EABI for its processors, and it's generally followed by both Linux and Windows; the aggregates alignment is specified in "Procedure Call Standard for the ARM Architecture Documentation", §4.3.
on Windows there's no cross-vendor standard, but VC++ essentially dictates the ABI, to which virtually any compiler adhere; it can be found here for x86_64, here for ARM (but for the part of interest of this question it just refers to the ARM EABI).
Any particular compiler ought to be deterministic, but between any two compilers, or even the same compiler with different compilation options, or even between different versions of the same compiler, all bets are off.
You're much better off if you don't depend on the details of the structure, or if you do, you should embed code to check at runtime that the structure is actually as you depend.
A good example of this is the recent change from 32 to 64 bit architectures, where even if you didn't change the size of integers used in a structure, the default packing of partial integers changed; where previously 3 32bit integers in a row would pack perfectly, now they pack into two 64 bit slots.
You can't possibly anticipate what changes may occur in the future; if you depend on details that are not guaranteed by the language, such as structure packing, you ought to verify your assumptions at runtime.
You cannot approach deterministically the layout of a structure or union in C language on different systems.
While many times it could seem that the layout generated by different compilers is the same, you must consider the cases a convergence dictated by practical and functional convenience of compiler design in the ambit of choice freedom left to the programmer by the standard, and thus not effective.
The C11 standard ISO/IEC 9899:2011, almost unchanged from previous standards, clearly stated in paragraph 6.7.2.1 Structure and union specifiers:
Even worst the case of bitfields where a large autonomy is left to the programmer:
Just count how many times the terms 'implementation-defined' and 'unspecified' appear in the text.
Agreed that to check compiler version, machine and target architecture each run before to use structure or union generated on a different system is unaffordable you should have got a decent answer to your question.
Now let's say that yes, there is a way-around.
Be clear that it is not definitely the solution, but is a common approach that you can found around when data structures exchange is shared between different systems: pack structure elements on value 1 (standard char size).
The use of packing and an accurate structure definition can lead to a sufficiently reliable declaration that can be used on different systems. The packing forces the compiler to remove implementation defined alignments, reducing the eventual incompatibilities due to standard. Moreover avoiding to use bitfields you can remove residual implementation dependent inconsistencies. Last, the access efficiency, due to missing alignment can be recreated by manually adding some dummy declaration inbetween elements, crafted in such a way to force back each field on correct alignment.
As a residual case you have to consider a padding at structure end that some compilers add, but because there is no useful data associated you can ignore it (unless for dynamic space allocation, but again you can deal with it).
Yes. You should always assume deterministic behaviour from your compiler.
[EDIT] From the comments below, it is obvious there are many Java programmers reading the question above. Let's be clear: C structs do not generate any name, hash, or the likes in object files, libraries, or dlls. The C function signatures do not refer to them either. Which means, the member names can be changed at whim - really! - provided the type and order of the member variables is the same. In C, the two structures in the example are equivalent, since packing does not change. which means that the following abuse is perfectly valid in C, and there's certainly much worse abuse to be found in some of the most widely-used libraries.
[EDIT2] No one should ever dare to do any of the following in C++
Any sane compiler will produce identical memory layout for the two structs. Compilers are usually written as perfectly deterministic programs. Non-determinism would need to be added explicitly and deliberately, and I for one fail to see the benefit of doing so.
However, that does not allow you to cast a
struct _a*
to astruct _b*
and access its data via both. Afaik, this would still be a violation of strict aliasing rules even if the memory layout is identical, as it would allow the compiler to reorder accesses via thestruct _a*
with accesses via thestruct _b*
, which would result in unpredictable, undefined behavior.In practice, they mostly like to have the same memory layout.
In theory, since the standard doesn't say much on how padding should be employed on objects, you can't really assume anything on the padding between the elements.
Also, I can't see even why would you want to know/assume something about the padding between the members of a struct. simply write standard, compliant C code and you'll be fine.