Typesafe varargs in C with gcc

2019-04-03 10:15发布

问题:

Many times I want a function to receive a variable number of arguments, terminated by NULL, for instance

#define push(stack_t stack, ...) _push(__VARARG__, NULL);
func _push(stack_t stack, char *s, ...) {
    va_list args;
    va_start(args, s);
    while (s = va_arg(args, char*)) push_single(stack, s);
}

Can I instruct gcc or clang to warn if foo receives non char* variables? Something similar to __attribute__(format), but for multiple arguments of the same pointer type.

回答1:

I know you're thinking of using __attribute__((sentinel)) somehow, but this is a red herring.

What you want is to do something like this:

#define push(s, args...) ({                   \
  char *_args[] = {args};                     \
  _push(s,_args,sizeof(_args)/sizeof(char*)); \
})

which wraps:

void _push(stack_t s, char *args[], int argn);

which you can write exactly the way you would hope you can write it!

Then you can call:

push(stack, "foo", "bar", "baz");
push(stack, "quux");


回答2:

I can only think of something like this:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

typedef struct tArg
{
  const char* Str;
  struct tArg* Next;
} tArg;

tArg* Arg(const char* str, tArg* nextArg)
{
  tArg* p = malloc(sizeof(tArg));
  if (p != NULL)
  {
    p->Str = str;
    p->Next = nextArg;
  }
  else
  {
    while (nextArg != NULL)
    {
      p = nextArg->Next;
      free(nextArg);
      nextArg = p;
    }
  }
  return p;
}

void PrintR(tArg* arg)
{
  while (arg != NULL)
  {
    tArg* p;
    printf("%s", arg->Str);
    p = arg->Next;
    free(arg);
    arg = p;
  }
}

void (*(*(*(*(*(*(*Print8
  (const char* Str))
  (const char*))
  (const char*))
  (const char*))
  (const char*))
  (const char*))
  (const char*))
  (const char*)
{
  printf("%s", Str);
  // There's probably a UB here:
  return (void(*(*(*(*(*(*(*)
    (const char*))
    (const char*))
    (const char*))
    (const char*))
    (const char*))
    (const char*))
    (const char*))&Print8;
}

int main(void)
{
  PrintR(Arg("HELLO", Arg(" ", Arg("WORLD", Arg("!", Arg("\n", NULL))))));
//  PrintR(Arg(1, NULL));        // warning/error
//  PrintR(Arg(&main, NULL));    // warning/error
//  PrintR(Arg(0, NULL));        // no warning/error
//  PrintR(Arg((void*)1, NULL)); // no warning/error

  Print8("hello")(" ")("world")("!")("\n");
// Same warning/error compilation behavior as with PrintR()
  return 0;
}


回答3:

The problem with C variadics is that they are really bolted on afterwards, not really designed into the language. The main problem is that the variadic parameters are anonymous, they have no handles, no identifiers. This leads to the unwieldy VA macros to generate references to parameters without names. It also leads to the need to tell those macros where the variadic list starts and what type the parameters are expected to be of.

All this information really ought to be encoded in proper syntax in the language itself.

For example, one could extend existing C syntax with formal parameters after the ellipsis, like so

void foo ( ... int counter, float arglist );

By convention, the first parameter could be for the argument count and the second for the argument list. Within the function body, the list could be treated syntactically as an array.

With such a convention, the variadic parameters would no longer be anonymous. Within the function body, the counter can be referenced like any other parameter and the list elements can be referenced as if they were array elements of an array parameter, like so

void foo ( ... int counter, float arglist ) {
  unsigned i;
  for (i=0; i<counter; i++) {
    printf("list[%i] = %f\n", i, arglist[i]);
  }
}

With such a feature built into the language itself, every reference to arglist[i] would then be translated to the respective addresses on the stack frame. There would be no need to do this via macros.

Furthermore, the argument count would automatically be inserted by the compiler, further reducing opportunity for error.

A call to

foo(1.23, 4.56, 7.89);

would be compiled as if it had been written

foo(3, 1.23, 4.56, 7.89);

Within the function body, any access to an element beyond the actual number of arguments actually passed could be checked at runtime and cause a compile time fault, thereby greatly enhancing safety.

Last but not least, all the variadic parameters are typed and can be type checked at compile time just like non-variadic parameters are checked.

In some use cases it would of course be desirable to have alternating types, such as when writing a function to store keys and values in a collection. This could also be accommodated simply by allowing more formal parameters after the ellipsis, like so

void store ( collection dict, ... int counter, key_t key, val_t value );

This function could then be called as

store(dict, key1, val1, key2, val2, key3, val3);

but would be compiled as if it had been written

store(dict, 3, key1, val1, key2, val2, key3, val3);

The types of actual parameters would be compile time checked against the corresponding variadic formal parameters.

Within the body of the function the counter would again be referenced by its identifier, keys and values would be referenced as if they were arrays,

key[i] refers to the key of the i-th key/value pair value[i] refers to the value of the i-th value pair

and these references would be compiled to their respective addresses on the stack frame.

None of this is really difficult to do, nor has it ever been. However, C's design philosophy simply isn't conducive to such features.

Without a venturing C compiler implementor (or C preprocessor implementor) taking the lead to implement this or a similar scheme it is unlikely we will ever see anything of this kind in C.

The trouble is that folks who are interested in type safety and willing to put in the work to build their own compilers usually come to the conclusion that the C language is beyond salvage and one may as well start over with a better designed language to begin with.

I have been there myself, eventually decided to abandon the attempt, then implement one of Wirth's languages and added type safe variadics to that instead. I have since run into other people who told me about their own aborted attempts. Proper type safe variadics in C seem poised to remain elusive.