Is it legal to implement inheritance in C by casti

2019-02-10 23:33发布

问题:

Now I know I can implement inheritance by casting the pointer to a struct to the type of the first member of this struct.

However, purely as a learning experience, I started wondering whether it is possible to implement inheritance in a slightly different way.

Is this code legal?

#include <stdio.h>
#include <stdlib.h>

struct base
{
    double some;
    char space_for_subclasses[];
};

struct derived
{
    double some;
    int value;
};

int main(void) {
    struct base *b = malloc(sizeof(struct derived));
    b->some = 123.456;
    struct derived *d = (struct derived*)(b);
    d->value = 4;
    struct base *bb = (struct base*)(d);
    printf("%f\t%f\t%d\n", d->some, bb->some, d->value);
    return 0;
}

This code seems to produce desired results , but as we know this is far from proving it is not UB.

The reason I suspect that such a code might be legal is that I can not see any alignment issues that could arise here. But of course this is far from knowing no such issues arise and even if there are indeed no alignment issues the code might still be UB for any other reason.

  • Is the above code valid?
  • If it's not, is there any way to make it valid?
  • Is char space_for_subclasses[]; necessary? Having removed this line the code still seems to be behaving itself

回答1:

This is more-or-less the same poor man's inheritance used by struct sockaddr, and it is not reliable with the current generation of compilers. The easiest way to demonstrate a problem is like this:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

struct base
{
    double some;
    char space_for_subclasses[];
};
struct derived
{
    double some;
    int value;
};

double test(struct base *a, struct derived *b)
{
    a->some = 1.0;
    b->some = 2.0;
    return a->some;
}

int main(void)
{
    void *block = malloc(sizeof(struct derived));
    if (!block) {
        perror("malloc");
        return 1;
    }
    double x = test(block, block);
    printf("x=%g some=%g\n", x, *(double *)block);
    return 0;
}

If a->some and b->some were allowed by the letter of the standard to be the same object, this program would be required to print x=2.0 some=2.0, but with some compilers and under some conditions (it won't happen at all optimization levels, and you may have to move test to its own file) it will print x=1.0 some=2.0 instead.

Whether the letter of the standard does allow a->some and b->some to be the same object is disputed. See http://blog.regehr.org/archives/1466 and the paper it links to.



回答2:

As I read the standard, chapter §6.2.6.1/P5,

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...]

So, as long as space_for_subclasses is a char (array-decays-to-pointer) member and you use it to read the value, you should be OK.


That said, to answer

Is char space_for_subclasses[]; necessary?

Yes, it is.

Quoting §6.7.2.1/P18,

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Remove that and you'd be accessing invalid memory, causing undefined behavior. However, in your case (the second snippet), you're not accessing value anyway, so that is not going to be an issue here.