Type punning a struct in C and C++ via a union

2019-01-25 08:52发布

站内文章 / C++

70 0

该账号已被封号

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've compiled this in gcc and g++ with pedantic and I don't get a warning in either one:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct a {
    struct a *next;
    int i;
};

struct b {
    struct b *next;
    int i;
};

struct c {
    int x, x2, x3;
    union {
        struct a a;
        struct b b;
    } u;
};

void foo(struct b *bar) {
    bar->next->i = 9;
    return;
}

int main(int argc, char *argv[]) {
    struct c c;
    memset(&c, 0, sizeof c);
    c.u.a.next = (struct a *)calloc(1, sizeof(struct a));
    foo(&c.u.b);
    printf("%d\n", c.u.a.next->i);
    return 0;
}

Is this legal to do in C and C++? I've read about the type-punning but I don't understand. Is foo(&c.u.b) any different from foo((struct b *)&c.u.a)? Wouldn't they be exactly the same? This exception for structs in a union (from C89 in 3.3.2.3) says:

If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them. Two structures share a common initial sequence if corresponding members have compatible types for a sequence of one or more initial members.

In the union the first member of struct a is struct a *next, and the first member of struct b is struct b *next. As you can see a pointer to struct a *next is written, and then in foo a pointer to struct b *next is read. Are they compatible types? They're both pointers to a struct and pointers to any struct should be the same size, so they should be compatible and the layout should be the same right? Is it ok to read i from one struct and write to the other? Am I committing any type of aliasing or type-punning violation?

回答1:

In C:

struct a and struct b are not compatible types. Even in

typedef struct s1 { int x; } t1, *tp1;
typedef struct s2 { int x; } t2, *tp2;

s1 and s2 are not compatible types. (See example in 6.7.8/p5.) An easy way to identify non-compatible structs is that if two struct types are compatible, then something of one type can be assigned to something of the other type. If you would expect the compiler to complain when you try to do that, then they are not compatible types.

Therefore, struct a * and struct b * are also not compatible types, and so struct a and struct b do not share a common initial subsequence. Your union-punning is instead governed by the same rule for union punning in other cases (6.5.2.3 footnote 95):

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

In C++, struct a and struct b also do not share a common initial subsequence. [class.mem]/p18 (quoting N4140):

Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

[basic.types]/p9:

If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types. [ Note: Layout-compatible enumerations are described in 7.2. Layout-compatible standard-layout structs and standard-layout unions are described in 9.2. —end note ]

struct a * and struct b * are neither structs nor unions nor enumerations; therefore they are only layout-compatible if they are the same type, which they are not.

It is true that ([basic.compound]/p3)

Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible types shall have the same value representation and alignment requirements (3.11).

But that does not mean those pointer types are layout-compatible types, as that term is defined in the standard.

回答2:

What you could do (and i've been bitten by this before), is declare both struct's initial pointer to be void* and do casting. Since void is convertible to/from any pointer type, you would only be forced to pay an ugliness tax, and not risk gcc reordering your operations (which I've seen happen -- even if you use a union), as a result of compiler bugs in some versions. As @T.C. correctly points out, layout compatibility of a given type means that at the language level they are convertible; even if types might incidentally have the same size they are not necessarily layout compatible; which might give some greedy compilers to assume some other things based on that.

回答3:

I've had a similar question some time ago, and I think I can answer yours.

Yes, struct a and struct b are not compatible types, and pointers to them are also incompatible.

Yes, what you are doing is illegal even from the outdated point of view of the C89 standard. However, it may be interesting to note that if you reverse the order of elements in struct a and struct b, you would be able to access int i of a struct c instance (but not access its next pointer in any way, i.e. bar->i = 9; instead of bar->next->i = 9;), but only from the C89 standard's point of view.

But even if you'll reverse the order of elements in the two structs, what you're doing would still be illegal from the point of view of the C99 and C11 standards (as interpreted by the commitee). In C99, the part of the standard you have quoted has been changed to this:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

The last phrase is a bit ambiguous, since you can interpret "visible" in several ways, but, according to the commitee, this means that the inspection should be performed on an object of the union type in question.

So, in your case, the correct way to handle this would be something along the lines of:

struct a {
    int i;
    struct a *next;
};

struct b {
    int i;
    struct b *next;
};

union un {
    struct a a;
    struct b b;
};

struct c {
    int x, x2, x3;
    union un u;
};

/* ... */

void foo(union un *bar) {
    bar.b->next->i = 9; /* This is the "inspection" operation */
    return;
}

/* ... */

foo(&c.u);

That is all fine and interesting from the language-lawyer point of view, but in reality, if you don't apply different packing settings to them, structs with the same initial sequence will have it with the same layout (in 99.9% of cases). Actually, they should have the same layout even in your original setup, since the pointers to struct a and struct b should have the same size. So, if your compiler doesn't get nasty when you break strict aliasing, you can more-or-less safely typecast between them, or use them in a union the way you're using them now.

EDIT: as noted by @underscore_d in the comments to this answer, since the appropriate clauses in the C++ standards do not have the line "anywhere that a declaration of the completed type of the union is visible" in their appropriate parts, it may be possible that the C++ standard has the same stance on the subject as the C89 standard.