Is it safe to cast a C struct to another with fewe

2020-08-24 06:54发布

站内文章 / C

127 0

乱世女痞

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to do OOP on C (just for fun) and I've come up with a method to do data abstraction by having a struct with the public part and a larger struct with the public part first and then the private part. This way I create in the constructor the whole struct and return it casted to the small struct. Is this correct or could it fail?

Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// PUBLIC PART (header)
typedef struct string_public {
    void (*print)( struct string_public * );
} *string;

string string_class_constructor( const char *s );
void string_class_destructor( string s );

struct {
    string (*new)( const char * );
    void (*delete)( string );
} string_class = { string_class_constructor, string_class_destructor };


// TEST PROGRAM ----------------------------------------------------------------
int main() {
    string s = string_class.new( "Hello" );
    s->print( s );
    string_class.delete( s ); s = NULL;
    return 0;
}
//------------------------------------------------------------------------------

// PRIVATE PART
typedef struct string_private {
    // Public part
    void (*print)( string );
    // Private part
    char *stringData;
} string_private;

void print( string s ) {
    string_private *sp = (string_private *)( s );
    puts( sp->stringData );
}

string string_class_constructor( const char *s ) {
    string_private *obj = malloc( sizeof( string_private ) );
    obj->stringData = malloc( strlen( s ) + 1 );
    strcpy( obj->stringData, s );
    obj->print = print;
    return (string)( obj );
}

void string_class_destructor( string s ) {
    string_private *sp = (string_private *)( s );
    free( sp->stringData );
    free( sp );
}

回答1:

In theory, this could be unsafe. Two separately-declared structs are allowed to have different internal arrangements, as there's absolutely no positive requirement for them to be compatible. In practice, a compiler is highly unlikely to actually generate different structures for two identical member lists (unless there's an implementation-specific annotation somewhere, at which points the bets are off - but you'd know about this).

The conventional solution is to take advantage of the fact that a pointer to any given struct is always guaranteed to be the same as the pointer to that struct's first element (i.e. structs do not have leading padding: C11, 6.7.2.1.15). That means that you can force the leading elements of two structs to be not only the same, but strictly compatible, by using a value struct of a shared type in the leading position for both of them:

struct shared {
    int a, b, c;
};
struct foo {
    struct shared base;
    int d, e, f;
};
struct Bar {
    struct shared base;
    int x, y, z;
};

void work_on_shared(struct shared * s) { /**/ }

//...
struct Foo * f = //...
struct Bar * b = //...
work_on_shared((struct shared *)f);
work_on_shared((struct shared *)b);

This is perfectly compliant and guaranteed to work, because packing the shared elements into a single leading struct means that only the position of the leading element of Foo or Bar is ever explicitly relied upon.

In practice alignment isn't likely to be the problem that bites you. A much more pressing concern is aliasing (i.e. the compiler is allowed to assume pointers to incompatible types do not alias). A pointer to a struct is always compatible with a pointer to one of its member types, so the shared base strategy will give you no problems; using types that the compiler isn't forced to mark as compatible could cause it to emit incorrectly optimised code in some circumstances, which can be a very difficult Heisenbug to find if you aren't aware of it.

回答2:

Well, it might work, but it is not a very safe way to do things. Essentially you are just trying to ‘hide’ access to the object's private data by casting the structure short. The data is still there, it just can’t be accessed semantically. The problem with this approach is that you need to know exactly how the compiler is ordering the bytes in the structure or you will get varying results from the cast. From memory this is not defined in the C spec (someone else can correct me on this).

A better way would be to just prefix the private properties with private_ or something like that. If you really really want to limit scope, then create a static local data array inside the class’s .c file and append a ‘private’ data structure to this each time you create a new object. Essentially you are then keeping the private data inside the C module and making use of the c file scoping rules to give you your private access protection, though this is really a lot of work for nothing.

Also your OO design is a bit confusing. The string class is really a string factory object creating strings objects, and it would be clearer if you separated out these two things.

回答3:

Here's what I would do if you're really intent on hiding the definition of string_private.

First, you should extern the struct containing the class definition or it will be duplicated in every translation unit that declares the header. Move it to the 'c' file. Otherwise, very little changes in the public interface.

string_class.h:

#ifndef STRING_CLASS_H
#define STRING_CLASS_H
// PUBLIC PART (header)
typedef struct string_public {
    void (*print)( struct string_public * );
} *string;

string string_class_constructor( const char *s );
void string_class_destructor( string s );

typedef struct {
    string (*new)( const char * );
    void (*delete)( string );
} string_class_def; 

extern string_class_def string_class;

#endif

In the string_class source, declare a private structure type, not seen outside the translation unit. Make the public type a member of that struct. Constructor will allocate the private struct object, but return a pointer to the public object contained within. Use offsetof magic to cast from public back to private.

string_class.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stddef.h>
#include "string_class.h"

typedef struct string_private {
    void (*print)( string );
    char *string;
    struct string_public public;
} string_private;

string_class_def string_class = { string_class_constructor, string_class_destructor };

void print( string s ) {
    /* this ugly cast is where the "Magic"  happens.  Basically,
       it converts the string into a char pointer so subtraction will
       work on byte boundaries.  Then subtracts the offset of public 
       from the start of string_private to back up to a pointer to 
       the private object. "offsetof" should be in <stddef.h>*/
    string_private *sp = (string_private *)( (char*) s - offsetof(struct string_private, public));
    // Private part
    puts( sp->string );
}

string string_class_constructor( const char *s ) {
    string_private *obj = malloc( sizeof( string_private ) );
    obj->string = malloc( strlen( s ) + 1 );
    strcpy( obj->string, s );
    obj->public.print = print;
    return (string)( &obj->public );
}

void string_class_destructor( string s ) {
    string_private *sp = (string_private *)( (char*) s - offsetof(struct string_private, public));
    free( sp->string );
    free( sp );
}

Usage goes unchanged...

main.c:

#include <stdlib.h> // just for NULL
#include "string_class.h"

// TEST PROGRAM ----------------------------------------------------------------
int main() {
    string s = string_class.new( "Hello" );
    s->print( s );
    string_class.delete( s ); s = NULL;
    return 0;
}
//------------------------------------------------------------------------------

回答4:

C does not guarantee that it will work, but generally it does. In particular, C explicitly leaves most aspects of the representation of struct values unspecified (C99 6.2.6.1), including whether the representation of values of your smaller struct will be the same as the layout of the corresponding initial members of the larger struct.

If you want an approach that C guarantees will work, then give your subclass a member of its superclass's type (not a pointer to such). For example,

typedef struct string_private {
    struct string_public parent;
    char *string;
} string_private;

That requires different syntax for accessing "inherited" members, but you can be absolutely sure that ...

string_private *my_string;
/* ... initialize my_string ... */
function_with_string_parameter((string) my_string);

... works (given that you have typedefed "string" as struct string_public *). Moreover, you can even avoid casts like so:

function_with_string_parameter(&my_string->parent);

How useful any of this may be is an altogether different question, however. Using object-oriented programming is not an appropriate objective in itself. OO is a tool for organizing your code that has some notable advantages, but you can write in OO style without mimicking the specific syntax of any particular OO language.

回答5:

In most cases, this is all right with an initial sequence of any length, since all known compilers will give the common members of the two structs the same padding. If they didn't give them the same padding, they'd have a hell of a time following this requirement of the C standard:

One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them.

I really can't imagine how a compiler would handle this if the "initial sequence" would be padded differently in the two structs.

But there is one serious "but". strict aliasing should be turned off for this setup to work.

Strict aliasing is a rule that basically states that two pointers of incompatible types cannot reference the same memory location. Therefore, if you cast a pointer to your larger struct to a pointer to the smaller one (or vice versa), get the value of a member in their initial sequence via dereferencing one of them, then change that value via the other, and then check it again from the first pointer, it won't have changed. I.e.:

struct smaller_struct {
    int memb1;
    int memb2;
}

struct larger_struct {
    int memb1;
    int memb2;
    int additional_memb;
}

/* ... */

struct larger_struct l_struct, *p_l_struct;
struct smaller_struct *p_s_struct;

p_l_struct = &l_struct;
p_s_struct = (struct smaller_struct *)p_l_struct;

p_l_struct->memb1 = 1;
printf("%d", p_l_struct->memb1); /* Outputs 1 */

p_s_struct->memb1 = 2;

printf("%d", p_l_struct->memb1); /* Should output 1 with strict-aliasing enabled and 2 without strict-aliasing enabled */

You see, a compiler which uses strict-aliasing optimisations (like GCC in -O3 mode) wants to make life easier for itself: it considers that two pointers of incompatible types just can't reference the same memory location, so it doesn't consider that they do. So, when you access p_s_struct->memb1, it will think that nothing ever changed the value of p_s_struct->memb1 (that it knows to be 1), so it won't "check" memb1's actual value and just output 1.

A way to circumvent this could be declaring your pointers as pointing to volatile data (which means telling the compiler that this data can be changed from elsewhere without it noticing), but the standard doesn't guarantee this to work.

Please note that all said above applies to structs that are not packed in a special way by the compiler.

回答6:

Whether or not this code will work on a given compiler depends upon the quality, target platform, and intended usage of the compiler in question. There are two places you might run into trouble:

On some platforms, the fastest way to write the last member of a structure may disturb padding bits or bytes that follow it. If that object is part of the Common Initial Sequence shared with a longer structure, and bits that were used as padding in the shorter one are used to hold meaningful data in the longer one, such data might get disturbed when writing the last field in the shorter type. I don't think I've seen any compilers actually do this, but the behavior would be allowable, which is why the CIS rule only allows for "inspection" of common members.
While quality compilers should seek to uphold the Common Initial Sequence guarantees in useful fashion, the Standard treats support for such things as a Quality of Implementation issue, and it has become more fashionable for some compilers to interpret N1570 6.5p7 in the lowest-quality fashion they think the Standard would allow, unless invoked with -fno-strict-aliasing. From my observation, icc seems to support the CIS guarantees in -fstrict-aliasing mode, but both gcc and clang process a low-quality dialect that for all practical purposes ignores the Common Initial Sequence rule even in cases where pointers are never aliased within their respective lifetimes.

Use a good compiler and your code will work. Use a poor-quality compiler, or one that is configured to behave in poor-quality fashion, and your code will fail.

回答7:

Casting from one struct to another is unreliable because the types are incompatible. What you can rely on though is that if the first elements of the parent struct are all at the top of the child struct and in the same order, then a reinterpret cast will let you do what you want. Like so:

struct parent {
  int data;
  char *more_data;
};

struct child {
  int data;
  char *more_data;
  double even_more_data;
};

int main() {
  struct child c = {0};

  struct parent p1 = (struct parent) c; /* bad */

  struct parent p2 = *(struct parent *) &c; /* good */
}

This is the exact same way that python implements its object oriented programing at the C level.

回答8:

If I remember correctly, this type of casting is undefined behaviour per the standard. But, GCC and MS C both guarantee that this will work as you think.

So, for example:

struct small_header {
    char[5]  ident;
    uint32_t header_size;
}

struct bigger_header {
    char[5]  ident;
    uint32_t header_size;
    uint32_t important_number;
}

You can cast them back and forth and access the two first members safely. Of course, if you have a small one and cast it to the big one, accessing the important_number member with get you an UB.

Edit:

This guy makes a nice article about this:

Type punning isn't funny: Using pointers to recast in C is bad.

回答9:

Another elegant way to extend structs with a common part (like OOP)

#define BASE_T \
    int a;     \
    int b;     \
    int c;

struct Base_t {
    BASE_T
};
struct Foo_t {
    BASE_T
    int d, e, f;
};
struct Bar_t {
    BASE_T
    int x, y, z;
};

void doBaseStuff(struct Base_t * pBase) {
    pBase->a = 1;
    pBase->b = 2;
    pBase->c = 3;
}

int main() {
    struct Foo_t foo;
    struct Bar_t bar;
    doBaseStuff((struct Base_t*) &foo);
    doBaseStuff((struct Base_t*) &bar);
    bar.a = 0; // I can directly access on properties of BASE_T, without doing any cast
    foo.e = 6;
    return 0;
}

This code is compatible for C98 and C99, but do not add any spaces after escape characters \ in BASE_T