Type-safe generic containers with macros

2020-03-24 03:15发布

问题:

I'm trying to make a type-safe generic linked list in C using macros. It should work similarly to how templates work in C++. For example,

LIST(int) *list = LIST_CREATE(int);

My first attempt was for #define LIST(TYPE) (the macro I used above) to define a struct _List_##TYPE {...}. That, however, did not work because the struct would be redefined every time I declared a new list. I remedied the problem by doing this:

/* You would first have to use this macro, which will define
   the `struct _List_##TYPE`...                               */
DEFINE_LIST(int);

int main(void)
{
    /* ... And this macro would just be an alias for the struct, it
       wouldn't actually define it.                                  */
    LIST(int) *list = LIST_CREATE(int);
    return 0;
}

/* This is how the macros look like */

#define DEFINE_LIST(TYPE)    \
    struct _List_##TYPE      \
    {                        \
        ...                  \
    }

#define LIST(TYPE)       \
    struct _List_##TYPE

But another problem is that when I have multiple files that use DEFINE_LIST(int), for example, and some of them include each other, then there will still be multiple definitions of the same struct. Is there any way to make DEFINE_LIST check if the struct has already been defined?

/* one.h */
DEFINE_LIST(int);

/* two.h */
#include "one.h"
DEFINE_LIST(int); /* Error: already defined in one.h */ 

回答1:

I tackled this problem in C before C++ acquired templates and I still have code.

You can't define a truly generic typesafe container-of-T template with macros in a way that's confined entirely to header files. The standard preprocessor provides no means of "pushing" and "popping" the macro assignments you will require so as preserve their integrity through nested and sequential contexts of expansion. And you will encounter nested contexts as soon as you try to eat your own dog food by defining a container-of-containers-of-T.

The thing can be done, as we'll see, but as @immortal suggests, it entails generating distinct .h and .c files for each value of T that you require. You can, for example, define a completely generic list-of-T with macros in an inline file, say, list_type.inl, and then include list_type.inl in a each of pair of small set-up wrappers - list_float.h and list_float.c - that will respectively define and implement the list-of-float container. Similarly for list-of-int, list-of-list-of-float, list-of-vector-of-list-of-double, and so so.

A schematic example will make all clear. But first just get the full measure of the eat-your-own-dogfood challenge.

Consider such a second-order container as a list-of-lists-of-thingummy. We want to be able to instantiate these by setting T = list-of-thingummy for our macro list-of-T solution. But in no way is list-of-thingummy going to be a POD datatype. Whether list-of-thingummy is our own dogfood or somebody else's, it's going to be an abstract datatype that lives on the heap and is represented to its users through a typedef-ed pointer type. Or at the very least, it is going to have dynamic components held on the heap. In any case, not POD.

This means it's not enough for our list-of-T solution just to be told that T = list-of-thingummy. It must also be told whether a T requires non-POD copy-construction and destruction, and if so how to copy-construct and destroy one. In C terms, that means:

  • Copy-construction: How to create a copy of a given T in a T-sized region of uncommitted memory, given the address of such a region.

  • Destruction: How to destroy the T at a given address.

We can do without knowing about default construction or construction from non-T parameters, as we can reasonably restrict our list-of-T solution to the containment of objects copied from user-supplied originals. But we do have to copy them, and we have to dispose of our copies.

Next, suppose that we aspire to offer a template for set-of-T, or map-of-T1-to-T2, in addition to list-of-T. These key-ordered datatypes add another parameter we will have to plug in for any non-POD value of T or T1, namely how to order any two objects of the key type. Indeed we will need that parameter for any key datatype for which memcmp() won't do.

Having noted that, we'll stick with the simpler list-of-T problem for the schematic example; and for further simplicity I'll forget the desirability of any const API.

For this and any other template container type we'll want some token-pasting macros that let us conveniently assemble identifiers of functions and types, plus probably other utility macros. These can all go in a header, say macro_kit.h, such as:

#ifndef MACRO_KIT_H
#define MACRO_KIT_H

/* macro_kit.h */

#define _CAT2(x,y) x##y

// Concatenate 2 tokens x and y
#define CAT2(x,y) _CAT2(x,y)
// Concatenate 3 tokens x, y and z
#define CAT3(x,y,z) CAT2(x,CAT2(y,z))

// Join 2 tokens x and y with '_' = x_y
#define JOIN2(x,y) CAT3(x,_,y)
// Join 3 tokens x, y and z with '_' = x_y_z
#define JOIN3(x,y,z) JOIN2(x,JOIN2(y,z))
// Compute the memory footprint of n T's
#define SPAN(n,T)   ((n) * sizeof(T))

#endif

Now to the schematic structure of list_type.inl:

//! There is intentionally no idempotence guard on this file
#include "macro_kit.h"
#include <stddef.h>

#ifndef INCLUDE_LIST_TYPE_INL
#error This file should only be included from headers \
that define INCLUDE_LIST_TYPE_INL
#endif

#ifndef LIST_ELEMENT_TYPE
#error Need a definition for LIST_ELEMENT_TYPE
#endif

/* list_type.inl

    Defines and implements a generic list-of-T container
    for T the current values of the macros:

    - LIST_ELEMENT_TYPE: 
        - must have a definition = the datatype (or typedef alias) for \
        which a list container is required.

    - LIST_ELEMENT_COPY_INITOR:
        - If undefined, then LIST_ELEMENT_TYPE is assumed to be copy-
        initializable by the assignment operator. Otherwise must be defined
        as the name of a copy initialization function having a prototype of
        the form:

        LIST_ELEMENT_TYPE * copy_initor_name(LIST_ELEMENT_TYPE *pdest,
                                            LIST_ELEMENT_TYPE *psrc);

        that will attempt to copy the LIST_ELEMENT_TYPE at `psrc` into the
        uncommitted memory at `pdest`, returning `pdest` on success and NULL
        on failure.

        N.B. This file itself defines the copy initializor for the list-type
        that it generates.

    - LIST_ELEMENT_DISPOSE
        If undefined, then LIST_ELEMENT_TYPE is assumed to need no
        destruction. Otherwise the name of a destructor function having a
        protoype of the form:

        void dtor_name(LIST_ELEMENT_TYPE pt*);

        that appropriately destroys the LIST_ELEMENT_TYPE at `pt`.

        N.B. This file itself defines the destructor for the list-type that
        it generates.
*/

/* Define the names of the list-type to generate, 
    e.g. list_int, list_float
*/
#define LIST_TYPE JOIN2(list,LIST_ELEMENT_TYPE)

/* Define the function-names of the LIST_TYPE API.
    Each of the API macros LIST_XXXX generates a function name in
    which LIST becomes the value of LIST_TYPE and XXXX becomes lowercase,
    e.g list_int_new
*/
#define LIST_NEW JOIN2(LIST_TYPE,new)
#define LIST_NODE JOIN2(LIST_TYPE,node)
#define LIST_DISPOSE JOIN2(LIST_TYPE,dispose)
#define LIST_COPY_INIT JOIN2(LIST_TYPE,copy_init)
#define LIST_COPY JOIN2(LIST_TYPE,copy)
#define LIST_BEGIN JOIN2(LIST_TYPE,begin)
#define LIST_END JOIN2(LIST_TYPE,end)
#define LIST_SIZE JOIN2(LIST_TYPE,size)
#define LIST_INSERT_BEFORE JOIN3(LIST_TYPE,insert,before)
#define LIST_DELETE_BEFORE JOIN3(LIST_TYPE,delete,before)
#define LIST_PUSH_BACK JOIN3(LIST_TYPE,push,back)
#define LIST_PUSH_FRONT JOIN3(LIST_TYPE,push,front)
#define LIST_POP_BACK JOIN3(LIST_TYPE,pop,back)
#define LIST_POP_FRONT JOIN3(LIST_TYPE,pop,front)
#define LIST_NODE_GET JOIN2(LIST_NODE,get)
#define LIST_NODE_NEXT JOIN2(LIST_NODE,next)
#define LIST_NODE_PREV JOIN2(LIST_NODE,prev)

/* Define the name of the structure used to implement a LIST_TYPE.
    This structure is not exposed to user code.
*/
#define LIST_STRUCT JOIN2(LIST_TYPE,struct)

/* Define the name of the structure used to implement a node of a LIST_TYPE.
    This structure is not exposed to user code.
*/
#define LIST_NODE_STRUCT JOIN2(LIST_NODE,struct)

/* The LIST_TYPE API... */


// Define the abstract list type
typedef struct LIST_STRUCT * LIST_TYPE;

// Define the abstract list node type
typedef struct LIST_NODE_STRUCT * LIST_NODE;

/* Return a pointer to the LIST_ELEMENT_TYPE in a LIST_NODE `node`,
    or NULL if `node` is null
*/
extern LIST_ELEMENT_TYPE * LIST_NODE_GET(LIST_NODE node);

/* Return the LIST_NODE successor of a LIST_NODE `node`,
    or NULL if `node` is null.
*/ 
extern LIST_NODE LIST_NODE_NEXT(LIST_NODE node);

/* Return the LIST_NODE predecessor of a LIST_NODE `node`,
    or NULL if `node` is null.
*/
extern LIST_NODE LIST_NODE_PREV(LIST_NODE node);


/* Create a new LIST_TYPE optionally initialized with elements copied from
    `start` and until `end`.

    If `end` is null it is assumed == `start` + 1.

    If `start` is not NULL then elements will be appended to the
    LIST_TYPE until `end` or until an element cannot be successfully copied.
    The size of the LIST_TYPE will be the number of successfully copied
    elements. 
*/ 
extern LIST_TYPE LIST_NEW(LIST_ELEMENT_TYPE *start, LIST_ELEMENT_TYPE *end);

/* Dispose of a LIST_TYPE
    If the pointer to LIST_TYPE `plist` is not null and addresses
    a non-null LIST_TYPE then the LIST_TYPE it addresses is
    destroyed and set NULL.
*/ 
extern void LIST_DISPOSE(LIST_TYPE * plist);

/* Copy the LIST_TYPE at `psrc` into the LIST_TYPE-sized region at `pdest`,
    returning `pdest` on success, else NULL.

    If copying is unsuccessful the LIST_TYPE-sized region at `pdest is
    unchanged.
*/
extern LIST_TYPE * LIST_COPY_INIT(LIST_TYPE *pdest, LIST_TYPE *psrc);

/* Return a copy of the LIST_TYPE `src`, or NULL if `src` cannot be
    successfully copied.
*/
extern LIST_TYPE LIST_COPY(LIST_TYPE src);

/* Return a LIST_NODE referring to the  start of the
    LIST_TYPE `list`, or NULL if `list` is null.
*/
extern LIST_NODE LIST_BEGIN(LIST_TYPE list);

/* Return a LIST_NODE referring to the end of the
    LIST_TYPE `list`, or NULL if `list` is null.
*/
extern LIST_NODE LIST_END(LIST_TYPE list);

/* Return the number of LIST_ELEMENT_TYPEs in the LIST_TYPE `list`
    or 0 if `list` is null.
*/
extern size_t LIST_SIZE(LIST_TYPE list);

/* Etc. etc. - extern prototypes for all API functions.
    ...
    ...
*/

/* If LIST_IMPLEMENT is defined then the implementation of LIST_TYPE is
    compiled, otherwise skipped. #define LIST_IMPLEMENT to include this
    file in the .c file that implements LIST_TYPE. Leave it undefined
    to include this file in the .h file that defines the LIST_TYPE API.
*/
#ifdef LIST_IMPLEMENT
// Implementation code now included.

// Standard library #includes...?

// The heap structure of a list node
struct LIST_NODE_STRUCT {
    struct LIST_NODE_STRUCT * _next;
    struct LIST_NODE_STRUCT * _prev;
    LIST_ELEMENT_TYPE _data[1];
};

// The heap structure of a LIST_TYPE
struct LIST_STRUCT {
    size_t _size;
    struct LIST_NODE_STRUCT * _anchor;
};

/* Etc. etc. - implementations for all API functions
    ...
    ...
*/

/*  Undefine LIST_IMPLEMENT whenever it was defined.
    Should never fall through. 
*/
#undef LIST_IMPLEMENT

#endif // LIST_IMPLEMENT 

/*  Always undefine all the LIST_TYPE parameters.
    Should never fall through. 
*/
#undef LIST_ELEMENT_TYPE
#undef LIST_ELEMENT_COPY_INITOR
#undef LIST_ELEMENT_DISPOSE
/* Also undefine the "I really meant to include this" flag. */

#undef INCLUDE_LIST_TYPE_INL

Note that list_type.inl has no macro-guard against mutliple inclusion. You want at least some of it - at least the template API - to be included every time it is seen.

If you read the comments at the top of the file you can guess how you would code a wrapping header to import a list-of-int container type.

#ifndef LIST_INT_H
#define LIST_INT_H

/* list_int.h*/

#define LIST_ELEMENT_TYPE int
#define INCLUDE_LIST_TYPE_INL
#include "list_type.inl"

#endif

and likewise how you would code the wrapping header to import a list-of-list-of-int container type:

#ifndef LIST_LIST_INT_H
#define LIST_LIST_INT_H

/* list_list_int.h*/

#define LIST_ELEMENT_TYPE list_int
#define LIST_ELEMENT_COPY_INIT list_int_copy_init
#define LIST_ELEMENT_DISPOSE list_int_dispose
#define INCLUDE_LIST_TYPE_INL
#include "list_type.inl"

#endif 

Your applications can safely include such wrappers, e.g.

#include "list_int.h"
#include "list_list_int.h"

despite the fact the they define LIST_ELEMENT_TYPE in conflicting ways because list_type.inl always #undefs all the macros that parameterize the list-type when it's done with them: see the last few lines of the file.

Note too the use of the macro LIST_IMPLEMENT. If its undefined when list_type.inl is parsed then only the template API is exposed; the template implementation is skipped. If LIST_IMPLEMENT is defined then the whole file is compiled. Thus our wrapping headers, by not defining LIST_IMPLEMENT, import only the list-type API.

Conversely for our wrapping source files list_int.c, list_list_int.c, we will define LIST_IMPLEMENT. After that, there's nothing to do but include the corresponding header:

/* list_int.c */
#define LIST_IMPLEMENT
#include "list_int.h"

and:

/* list_list_int.c*/
#include "list_int.h"
#define LIST_IMPLEMENT
#include "list_list_int.h"

Now in your application, no list-template macros appear. Your wrapping headers parse out to "real code":

#include "list_int.h"
#include "list_list_int.h"
// etc.

int main(void)
{
    int idata[10] = {1,2,3,4,5,6,7,8,9,10};
    //...
    list_int lint = list_int_new(idata,idata + 10);
    //...
    list_list_int llint = list_list_int_new(&lint,0);
    //...
    list_int_dispose(&lint);
    //...
    list_list_int_dispose(&llint);
    //...
    exit(0);
}

To equip yourself with a "C template library" this way the only (!) hard work is to write the .inl file for each container type you want and to test it very, very thoroughly. You would then probably generate an object file and header for each combination of native datatype and container type for off-the-shelf linkage, and knock out the .h and .c wrappers in a jiffy for other types on demand.

Needless to say, as soon as C++ sprouted templates my enthusiam for sweating them out this way evaporated. But it can be done this way, completely generically, if for some reason C is the only option.



回答2:

You could always add a second argument to the DEFINE_LIST macro that will allow you to "name" the list. For instance:

#define DEFINE_LIST(TYPE, NAME)          \
struct _List_##TYPE_##NAME               \
{                                        \
    TYPE member_1;                       \
    struct _List_##TYPE_##NAME* next;    \
}

Then you could simply do:

DEFINE_LIST(int, my_list);
//... more code that uses the "my_list" type

You would just have to restrict yourself to not re-using the same list "name" when two different header files include each other, and both use the DEFINE_LIST macro. You would also have to refer to the list by name when using LIST_CREATE, etc.

When passing the lists to functions that you've written, you can always create "generic" types that the user-defined "named" versions are cast to. This shouldn't affect anything since the actual information in the struct stays the same, and the "name" tag merely differentiates the types from a declaration rather than binary standpoint. For example, here is a function that takes list objects that store int types:

#define GENERIC_LIST_PTR(TYPE) struct _generic_list_type_##TYPE*
#define LIST_CAST_PTR(OBJ, TYPE) (GENERIC_LIST_PTR(TYPE))(OBJ)

void function(GENERIC_LIST_PTR(INT) list)
{
    //...use list as normal (i.e., access it's int data-member, etc.)
}

DEFINE_LIST(int, my_list);

int main()
{
    LIST(int, my_list)* list = LIST_CREATE(int, my_list);
    function(LIST_CAST_PTR(list, int));

    //...more code

    return 0;
}

I know this isn't necessarily the most convenient thing, but this does resolve the naming issues, and you can control what versions of struct _generic_list_type_XXX are created in some private header file that other users won't be adding to (unless they wish to-do so for their own types) ... but it would be a mechanism for separating the declaration and the definition of the generic list-type from the actual user-defined list-type.



回答3:

Why don't you use a library? I like to use GLib but I hate the void pointers in my code, in order to get a typesafe version of the data types provided by GLib I coded some very simple macros:

http://pastebin.com/Pc0KsadV

If I want a list of Symbol* (assuming it's a type I defined earlier) I just need to to:

GLIST_INSTANCE(SymbolList, Symbol*, symbol_list);

If you don't want to use a whole library (which would be a kind of overkill) for a simple linked list, implement a list that handles void* and create some functions to encapsulate and make the correct type casting.



回答4:

How about creating a list_template.h file and then creating a list_TYPE.h file and a list_TYPE.c file for every instance of the template. These can come with the proper header protectors, of course. You can then only include your template header but make sure to add all the .c files to the compile and link process, and it should work.

This is basically what C++ does automatically for you... Duplicating the instances...



回答5:

I really doubt you can do checking existence and defining (a struct) in one macro. Put another #ifndef check before DEFINE_LIST(int). It's not elegant but does what you want.



回答6:

It is possible to create generic and type-safe containers with macros. From the viewpoint of the theory of computation, the language (code) generated from macro expansions can be recognized by a nondeterministic pushdown automata which means that it is at most a context-free grammar. The aforementioned statement makes our goal seems impossible to achieve since the container and its affiliated iterators should remember the type they contains, but this can only be done by a context-sensitive grammar. However, we can do some tricks!

The key to success lies in the compilation process, building symbol tables. If the type of variable can be recognized when compiler queries the table and no unsafe type casting occurs, then it is regarded as type-safe. Therefore, we have to give every struct a special name because struct name may conflict if two or more structs are declared on the same level of scope. The easiest way is to append the current line number to the struct name. The standard C supports predefined macro __LINE__ and macro concatenation / stringification since ANSI C (C89/C90).

Then, what we have to do is to hide some attributes into the struct we defined as above. If you want to create another list record at run-time, put a pointer to itself in the struct will actually solve the problem. However, this is not enough. We might need an extra variable to store how many list records we allocate at run-time. This helps us figure out how to free the memory when the list is destroy explicitly by programmers. Also, we can take the advantage of __typeof__() extension which is widely used in macro programming.

I am the author of the OpenGC3 which aims at building type-safe generic containers with macros, and here is a short and brief example of how this library works:

ccxll(int) list;                      //  declare a list of type int
ccxll_init(list);                     //  initialize the list record

for (int cnt = 8; cnt-- > 0; )        //
    ccxll_push_back(list, rand());    //  insert "rand()" to the end

ccxll_sort(list);                     //  sort with comparator: XLEQ

CCXLL_INCR_AUTO(pnum, list)           //  traverse the list forward:
    printf("num = %d\n", *pnum);      //  access elems through iters

ccxll_free(list);                     //  destroy the list after use

It is quite similar to the syntax of the STL. The type of list is determined when list is declared. We have no need to concern about the type safety because there is no unsafe type casting when operations are performed to the list.