Is const-casting via a union undefined behaviour?

2019-06-14 23:39发布

问题:

Unlike C++, C has no notion of a const_cast. That is, there is no valid way to convert a const-qualified pointer to an unqualified pointer:

void const * p;
void * q = p;    // not good

First off: Is this cast actually undefined behaviour?

In any event, GCC warns about this. To make "clean" code that requires a const-cast (i.e. where I can guarantee that I won't mutate the contents, but all I have is a mutable pointer), I have seen the following "conversion" trick:

typedef union constcaster_
{
    void * mp;
    void const * cp;
} constcaster;

Usage: u.cp = p; q = u.mp;.

What are the C language rules on casting away constness through such a union? My knowledge of C is only very patchy, but I've heard that C is far more lenient about union access than C++, so while I have a bad feeling about this construction, I would like an argument from the standard (C99 I suppose, though if this has changed in C11 it'll be good to know).

回答1:

It's implementation defined, see C99 6.5.2.3/5:

if the value of a member of a union object is used when the most recent store to the object was to a different member, the behavior is implementation-defined.

Update: @AaronMcDaid commented that this might be well-defined after all.

The standard specified the following 6.2.5/27:

Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements.27)

27) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.

And (6.7.2.1/14):

A pointer to a union object, suitably converted, points to each of its members (or if a member is a bitfield, then to the unit in which it resides), and vice versa.

One might conclude that, in this particular case, there is only room for exactly one way to access the elements in the union.



回答2:

My understanding it that the UB can arise only if you try to modify a const-declared object.

So the following code is not UB:

int x = 0;
const int *cp = &x;
int *p = (int*)cp;
*p = 1; /* OK: x is not a const object */

But this is UB:

const int cx = 0;
const int *cp = &cx;
int *p = (int*)cp;
*p = 1; /* UB: cx is const */

The use of a union instead of a cast should not make any difference here.

From the C99 specs (6.7.3 Type qualifiers):

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.



回答3:

The initialization certainly won't cause UB. The conversion between qualified pointer types is explicitly allowed in §6.3.2.3/2 (n1570 (C11)). It's the use of content in that pointer afterwards that cause UB (see @rodrigo's answer).

However, you need an explicit cast to convert a void* to a const void*, because the constraint of simple assignment still require all qualifier on the LHS appear on the RHS.

§6.7.9/11: ... The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply, taking the type of the scalar to be the unqualified version of its declared type.

§6.5.16.1/1: (Simple Assignment / Contraints)

  • ... both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;
  • ... one operand is a pointer to an object type, and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;

I don't know why gcc just gives a warning though.


And for the union trick, yes it's not UB, but still the result is probably unspecified.

§6.5.2.3/3 fn 95: If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

§6.2.6.1/7: When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values. (* Note: see also §6.5.2.3/6 for an exception, but it doesn't apply here)


The corresponding sections in n1124 (C99) are

  • C11 §6.3.2.3/2 = C99 §6.3.2.3/2
  • C11 §6.7.9/11 = C99 §6.7.8/11
  • C11 §6.5.16.1/1 = C99 §6.5.16.1/1
  • C11 §6.5.2.3/3 fn 95 = missing ("type punning" doesn't appear in C99)
  • C11 §6.2.6.1/7 = C99 §6.2.6.1/7


回答4:

Don't cast it at all. It's a pointer to const which means that attempting to modify the data is not allowed and in many implementations will cause the program to crash if the pointer points to unmodifiable memory. Even if you know the memmory can be modified, there may be other pointers to it that do not expect it to change e.g. if it is part of the storage of a logically immutable string.

The warning is there for good reason.

If you need to modify the content of a const pointer, the portable safe way to do it is first to copy the memory it points to and then modify that.