Is struct packing deterministic?

2020-02-08 03:06发布

For example, say I have two equivalent structs a and b in different projects:

typedef struct _a
{
    int a;
    double b;
    char c;
} a;

typedef struct _b
{
    int d;
    double e;
    char f;
} b;

Assuming I haven't used any directives like #pragma pack and these structs are compiled on the same compiler on the same architecture, will they have identical padding between variables?

标签: c padding
8条回答
ゆ 、 Hurt°
2楼-- · 2020-02-08 03:44

ISO C says that two struct types in different translation units are compatible if they have the same tag and members. More precisely, here is the exact text from the C99 standard:

6.2.7 Compatible type and composite type

Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.5 for declarators. Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are complete types, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types, and such that if one member of a corresponding pair is declared with a name, the other member is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.

It seems very strange if we interpret it from the point of view of, "what, the tag or member names could affect padding?" But basically the rules are simply as strict as they can possibly be while allowing the common case: multiple translation units sharing the exact text of a struct declaration via a header file. If programs follow looser rules, they aren't wrong; they are just not relying on requirements for behavior from the standard, but from elsewhere.

In your example, you are running afoul of the language rules, by having only structural equivalence, but not equivalent tag and member names. In practice, this is not actually enforced; struct types with different tags and member names in different translation units are de facto physically compatible anyway. All sorts of technology depends on this, such as bindings from non-C languages to C libraries.

If both your projects are in C (or C++), it would probably be worth the effort to try to put the definition into a common header.

It's also a good idea to put in some defense against versioning issues, such as a size field:

// Widely shared definition between projects affecting interop!
// Do not change any of the members.
// Add new ones only at the end!
typedef struct a
{
    size_t size; // of whole structure
    int a;
    double b;
    char c;
} a;

The idea is that whoever constructs an instance of a must initialize the size field to sizeof (a). Then when the object is passed to another software component (perhaps from the other project), it can check the size against its sizeof (a). If the size field is smaller, then it knows that the software which constructed a is using an old declaration with fewer members. Therefore, the nonexistent members must not be accessed.

查看更多
forever°为你锁心
3楼-- · 2020-02-08 03:49

The compiler is deterministic; if it weren't, separate compilation would be impossible. Two different translation units with the same struct declaration will work together; that is guaranteed by §6.2.7/1: Compatible types and composite types.

Moreover, two different compilers on the same platform should interoperate, although this is not guaranteed by the standard. (It's a quality of implementation issue.) To allow inter-operability, compiler writers agree on a platform ABI (Application Binary Interface) which will include a precise specification of how composite types are represented. In this way, it is possible for a program compiled with one compiler to use library modules compiled with a different compiler.

But you are not just interested in determinism; you also want the layout of two different types to be the same.

According to the standard, two struct types are compatible if their members (taken in order) are compatible, and if their tags and member names are the same. Since your example structs have different tags and names, they are not compatible even though their member types are, so you cannot use one where the other is required.

It may seem odd that the standard allows tags and member names to affect compatibility. The standard requires that the members of a struct be laid out in declaration order, so names cannot change the order of members within the struct. Why, then, could they affect padding? I don't know of any compiler where they do, but the standard's flexibility is based on the principle that the requirements should be the minimum necessary to guarantee correct execution. Aliasing differently tagged structs is not permitted within a translation unit, so there is no need to condone it between different translation units. And so the standard does not allow it. (It would be legitimate for an implementation to insert information about the type in a struct's padding bytes, even if it needed to deterministically add padding to provide space for such information. The only restriction is that padding cannot be placed before the first member of a struct.)

A platform ABI is likely to specify the layout of a composite type without reference to its tag or member names. On a particular platform, with a platform ABI which has such a specification and a compiler documented to conform to the platform ABI, you could get away with the aliasing, although it would not be technically correct, and obviously the preconditions make it non-portable.

查看更多
登录 后发表回答