How to legally use type-punning with unions to cas

2019-04-10 16:45发布

问题:

POSIX intends pointers to variations of struct sockaddr to be castable, however depending on the interpretation of the C standard this may be a violation of the strict aliasing rule and therefore UB. (See this answer with comments below it.) I can, at least, confirm that there may at least be a problem with gcc: this code prints Bug! with optimization enabled, and Yay! with optimization disabled:

#include <sys/types.h>
#include <netinet/in.h>
#include <stdio.h>

sa_family_t test(struct sockaddr *a, struct sockaddr_in *b)
{
    a->sa_family = AF_UNSPEC;
    b->sin_family = AF_INET;
    return a->sa_family; // AF_INET please!
}

int main(void)
{
    struct sockaddr addr;
    sa_family_t x = test(&addr, (struct sockaddr_in*)&addr);
    if(x == AF_INET)
        printf("Yay!\n");
    else if(x == AF_UNSPEC)
        printf("Bug!\n");
    return 0;
}

Observe this behavior on an online IDE.

To workaround this problem this answer proposes the use of type punning with unions:

/*! Multi-family socket end-point address. */
typedef union address
{
    struct sockaddr sa;
    struct sockaddr_in sa_in;
    struct sockaddr_in6 sa_in6;
    struct sockaddr_storage sa_stor;
}
address_t;

However, apparently things are still not as simple as they look… Quoting this comment by @zwol:

That can work but takes a fair bit of care. More than I can fit into this comment box.

What kind of fair bit of care does it take? What are the pitfalls of the use of type punning with unions to cast between variations of struct sockaddr?

I prefer to ask than to run into UB.

回答1:

Using a union like this is safe,

from C11 §6.5.2.3:

  1. A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

and

  1. One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members

(highlighted what I think is most important)

With accessing the struct sockaddr member, you will read from the common initial part.


Note: This will not make it safe to pass pointers to the members around anywhere and expect the compiler knows they refer to the same stored object. So the literal version of your example code might still break because in your test() the union is not known.

Example:

#include <stdio.h>

struct foo
{
    int fooid;
    char x;
};

struct bar
{
    int barid;
    double y;
};

union foobar
{
    struct foo a;
    struct bar b;
};

int test(struct foo *a, struct bar *b)
{
    a->fooid = 23;
    b->barid = 42;
    return a->fooid;
}

int test2(union foobar *a, union foobar *b)
{
    a->a.fooid = 23;
    b->b.barid = 42;
    return a->a.fooid;
}

int main(void)
{
    union foobar fb;
    int result = test(&fb.a, &fb.b);
    printf("%d\n", result);
    result = test2(&fb, &fb);
    printf("%d\n", result);
    return 0;
}

Here, test() might break, but test2() will be correct.



回答2:

Given the address_t union you propose

typedef union address
{
    struct sockaddr sa;
    struct sockaddr_in sa_in;
    struct sockaddr_in6 sa_in6;
    struct sockaddr_storage sa_stor;
}
address_t;

and a variable declared as address_t,

address_t addr; 

you can safely initialize addr.sa.sa_family and then read addr.sa_in.sin_family (or any other pair of aliased _family fields). You can also safely use addr in a call to recvfrom, recvmsg, accept, or any other socket primitive that takes a struct sockaddr * out-parameter, e.g.

bytes_read = recvfrom(sockfd, buf, sizeof buf, &addr.sa, sizeof addr);
if (bytes_read < 0) goto recv_error;
switch (addr.sa.sa_family) {
  case AF_INET:
    printf("Datagram from %s:%d, %zu bytes\n",
           inet_ntoa(addr.sa_in.sin_addr), addr.sa_in.sin_port,
           (size_t) bytes_read);
    break;
  case AF_INET6:
    // etc
}

And you can also go in the other direction,

memset(&addr, 0, sizeof addr);
addr.sa_in.sin_family = AF_INET;
addr.sa_in.sin_port = port;
inet_aton(address, &addr.sa_in.sin_addr);
connect(sockfd, &addr.sa, sizeof addr.sa_in);

It is also okay to allocate address_t buffers with malloc, or embed it in a larger structure.

What's not safe is to pass pointers to individual sub-structures of an address_t union to functions that you write. For instance, your test function ...

sa_family_t test(struct sockaddr *a, struct sockaddr_in *b)
{
    a->sa_family = AF_UNSPEC;
    b->sin_family = AF_INET;
    return a->sa_family; // AF_INET please!
}

... may not be called with (void *)a equal to (void *)b, even if this happens because the callsite passed &addr.sa and &addr.sa_in as the arguments. Some people used to argue that this should be allowed when a complete declaration of address_t was in scope when test was defined, but that's too much like "spukhafte Fernwirkung" for the compiler devs; the interpretation of the "common initial subsequence" rule (quoted in Felix's answer) adopted by the current generation of compilers is that it only applies when the union type is statically and locally involved in a particular access. You must write instead

sa_family_t test2(address_t *x)
{
    x->sa.sa_family = AF_UNSPEC;
    x->sa_in.sa_family = AF_INET;
    return x->sa.sa_family;
}

You might be wondering why it's okay to pass &addr.sa to connect then. Very roughly, connect has its own internal address_t union, and it begins with something like

int connect(int sock, struct sockaddr *addr, socklen_t len)
{
    address_t xaddr;
    memcpy(xaddr, addr, len);

at which point it can safely inspect xaddr.sa.sa_family and then xaddr.sa_in.sin_addr or whatever.

Whether it would be okay for connect to just cast its addr argument to address_t *, when the caller might not have used such a union itself, is unclear to me; I can imagine arguments both ways from the text of the standard (which is ambiguous on certain key points having to do with the exact meanings of the words "object", "access", and "effective type"), and I don't know what compilers would actually do. In practice connect has to do a copy anyway, because it's a system call and almost all memory blocks passed across the user/kernel boundary have to be copied.