-->

Is pointer arithmetic on inactive member of a unio

2019-01-27 14:21发布

问题:

Let's consider this example code:

struct sso
{
    union {
        struct {
            char* ptr;
            char size_r[8];
        } large_str;
        char short_str[16];
    };

    const char* get_tag_ptr() const {
        return short_str+15;
    }
};

In [basic.expr] it is specified that pointer arithmetic is allowed as long as the result points to another element of the array (or past the end of an object or of the last element). Nevertheless it is not specified in this setion what happens if the array is an inactive member of a union. I believe it is not an issue short_str+15 is never UB. Is it right?

The following question clearly showes my intent

回答1:

Writing return short_str+15;, you take the address of an object whose lifetime may have not started, but this does not result in undefined behavior unless you dereference it.

[basic.life]/1.2

if the object is a union member or subobject thereof, its lifetime only begins if that union member is the initialized member in the union, or as described in [class.union].

and

[class.union]/1

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]). At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

but

[basic.life]/6

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a pointer refers to allocated storage ([basic.stc.dynamic.allocation]), and using the pointer as if the pointer were of type void* , is well-defined. Indirection through such a pointer is permitted but the resulting lvalue may only be used in limited ways, as described below.
- [list unrelated to unions]



回答2:

Whether pointer arithmetic on union members will lead to aliasing depends upon how the pointers will end up being used. On implementations which supplement the Standard with a guarantee that "type-access" rules will only be applied in cases where there is actual aliasing, or (for C++) in cases involving types with non-trivial semantics, the validity of pointer operations would have little to do with whether they are performed upon active or inactive members.

Consider, for example:

#include <stdint.h>

uint32_t readU(uint32_t *p) { return *p; }
void writeD(double *p, double v) { *p = v; }

union udBlob { double dd[2]; uint32_t ww[4]; } udb;

uint32_t noAliasing(int i, int j)
{
  if (readU(udb.ww+i))
    writeD(udb.dd+j, 1.0);
  return readU(udb.ww+i);
}

uint32_t aliasesUnlessDisjoint(int i, int j)
{
  uint32_t *up = udb.ww+i;
  double *dp = udb.dd+j;

  if (readU(up))
    writeD(dp, 1.0);
  return readU(up);
}

During the execution of readU, no storage that is accessed via *p will be accessed via any other means, so there is no aliasing during the execution of that function. Likewise during the execution of writeD. During the execution of noAliasing, all operations that will affect any storage associated with udb are performed using pointers that are all derived from udb and clearly have active lifetimes that clearly do not overlap, so there is no aliasing there.

During the execution of aliasesUnlessDisjoint, all accesses are performed using pointers which are derived from udb, but storage is accessed via up between the creation and use of dp, and storage is accessed via dp between the creation and use of up. Consequently, *dp and *up will alias during the execution of aliasesUnlessDisjoint unless udb.ww[i] and udb.dd[j] occupy disjoint storage.

Note that both gcc and clang apply type-access rules even in cases like the no-aliasing function above where there is no actual aliasing. Despite the fact that the Standard explicitly says that an lvalue expression of the form someArray[y] is equivalent to *(someArray+(y)), gcc and clang will only allow reliable access to array members within a union if the [] syntax is used. For example:

uint32_t noAliasing2(int i, int j)
{
  if (udb.ww[i])
    udb.ww[j] = 1.0;
  return udb.ww[i];
}
uint32_t noAliasing3(int i, int j)
{
  if (*(udb.ww+i))
    *(udb.dd+j) = 1.0;
  return *(udb.ww+i);
}

Although the code produced by gcc or clang for noAliasing2 will reload udb.ww[i] after the operation on udb.dd[j], the code for noAliasing3 will not. This is technically allowable under the Standard (since the rules, as written, don't allow udb.ww[i] to be accessed under any circumstances!), but that in no way implies any judgment on the part of the authors that the behavior of gcc and clang is appropriate in a high-quality implementations. Looking purely at the Standards, I see nothing to suggest that any particular one of the noAliasing forms should be more or less valid than any other, but programmers considering use of gcc or clang in -fstrict-aliasing mode should recognize that gcc and clang treat them differently.



回答3:

Richard Smith (WG21 Project Editor) told that array indexing is UB when the array is outside its lifetime.