How much is it possible to create fake-functions w

2019-01-23 17:32发布

问题:

People always say that macros are unsafe, and also that they are not (directly) type-checking on their arguments, and so on. Worse: when errors occur, the compiler gives intrincate and incomprehensible diagnostics, because the macro is just a mess.

Is it possible to use macros in almost the same way as a function, by having safe type-checking, avoiding typical pitfalls and in a way that the compiler gives the right diagnostic.

  1. I am going to answer this question (auto-answering) in an affirmative way.
  2. I want to show you the solutions that I've found to this problem.
  3. The standard C99 will be used and respected, to have a uniform background.
  4. But (obviously there is a "but"), it will "define" some kind of "syntax" that people would have to "eat".
  5. This special syntax intends to be the simplest to write as much as the easiest to understand and/or handle, minimizing the risks of ill formed programs, and more importantly, obtaining the right diagnostic messages from the compiler.
  6. Finally, it will study two cases: "non-returning value" macros (easy case) and "returning-value" macros (not-easy, but more interesting case).

Let us quickly remember some typical pitfalls produced by macros.

Example 1

#define SQUARE(X) X*X
int i = SQUARE(1+5);

Intended value of i: 36. True value of i: 11 (with macro expansion: 1+5*1+5). Pitfall!

(Typical) Solution (Example 2)

#define SQUARE(X) (X)*(X)
int i = (int) SQUARE(3.9);

Intended value of i: 15. True value of i: 11 (after macro expansion: (int) (3.9)*(3.9)). Pitfall!

(Typical) Solution (Example 3)

#define SQUARE(X) ((X)*(X))

It works fine with integers and floats, but it is easily broken:

int x = 2;
int i = SQUARE(++x);

Intended value of i: 9 (because (2+1)*(2+1)...). True value of i: 12 (macro expansion: ((++x)*(++x)), which gives 3*4). Pitfall!

A nice method for type-checking in macros can be found here:

  • How to verify a type in a C macro? (by J. Gustedt)

However I want more: some kind of interface or "standard" syntax, and a (small) number of easy-to-remember rules. The intent is "be able to use (not to implement)" macros as similar to functions as possible. That means: well written fake-functions.

Why is that interesting in some way?

I think that is an interesting challenge to achieve in C.

Is it useful?

Edit: In standard C is not possible to define nested functions. But, sometimes, one would prefer to be able to define short (inline) functions nested inside other ones. Thus, a function-like prototyped macro would be a possibility to take in account.

回答1:

This answer is divided in 4 sections:

  1. Proposed solution for block macros.
  2. A brief summary of that solution.
  3. Macro-prototype syntax is discussed.
  4. Proposed solution for function-like macros.
  5. (Important update:) Broking my code.

(1.) 1st case. Block macros (or non-returning value macros)

Let us consider easy examples first. Suppose that we need a "command" that prints the square of integer numbers, followed by '\n'. We decided to implement it with a macro. But we want the argument to be verified by the compiler as an int. We write:

#define PRINTINT_SQUARE(X) {    \
   int x = (X);              \
   printf("%d\n", x*x);      \
}
  • The parentheses surrounding (X) avoid almost all pitfalls.
  • Moreover, the parentheses help the compiler to properly diagnose syntax errors.
  • The macro parameter X is invoked only once inside the macro. This avoids the pitfall of Example 3 of the question.
  • The value of X is immediately held in the variable x.
  • In the rest of the macro, we use the variable x instead X.
  • [Important Update:] (This code can be broken: see section 5).

If we systematize this discipline, the typical problems of macros will be avoided.
Now, something like this correctly prints 9:

int i = 3;
PRINTINT_SQUARE(i++);  

Obviously this approach could have a weak point: the variable x defined inside the macro could have conflicts with other variables in the program also called x. This is a scope issue. However, it's not a problem since the macro-body has been written as a block enclosed by { }. This is enough to handle every scope-issue, and every potential problem with the "inner" variables x is tackled.

It could be argued that the variable x is an extra object and maybe not desired. But x has (only) temporary duration: it is created at the beginning of the macro, with the opening {, and it is destroyed at the end of the macro, with the closing }. In this way, x it is working as a function parameter: a temporal variable is created to hold the value of the parameter, and it is finally discarded when the macro "returns". We are not committing any sin that functions have not done yet!

More important: when the programmer attempts to "call" the macro with a wrong parameter, the compiler gives the same diagnostic that a function would give under the same situation.

So, it seems every macro pitfall has been solved!

However, we have a little syntactical issue, as you can see here:

  • C multi-line macro: do/while(0) vs scope block
  • Why use apparently meaningless do-while and if-else statements in macros?
  • do ... while (0) macro substitutions

Therefore, it is imperative (I say) to add a do {} while(0) construct to the block-like macro definition:

#define PRINTINT_SQUARE(X) do {    \
   int x = (X);              \
   printf("%d\n", x*x);      \
} while(0)

Now, this do { } while(0) stuff works fine, but it is anti-aesthetical. The problem is that it has no intuitive meaning for the programmer. I suggest the use of a meaningful approach, like this:

#define xxbeg_macroblock do {
#define xxend_macroblock } while(0)
#define PRINTINT_SQUARE(X)        \
  xxbeg_macroblock             \
       int x = (X);            \
       printf("%d\n", x*x);    \
  xxend_macroblock

(The inclusion of } in xxend_macroblock avoids some ambiguity with while(0)). Of course, this syntax is not safe anymore. It has to be carefully documented to avoid misuses. Consider the following ugly example:

{ xxend_macroblock printf("Hello");

(2.) Summarizing

Block-defined macros that do not return values can behave like functions if we write them by following the disciplined style:

#define xxbeg_macroblock do {
#define xxend_macroblock } while(0)

#define MY_BLOCK_MACRO(Par1, Par2, ..., ParN)     \
  xxbeg_macroblock                         \
       desired_type1 temp_var1 = (Par1);   \
       desired_type2 temp_var2 = (Par2);   \
       /*   ...        ...         ...  */ \
       desired_typeN temp_varN = (ParN);   \
       /* (do stuff with objects temp_var1, ..., temp_varN); */ \
  xxend_macroblock
  • A call to the macro MY_BLOCK_MACRO() is a statement, not an expression: there is no "return" value of any kind, not even void.
  • The macro parameters must be used just once, at the beginning of the macro, and pass their values to actual temporary variables with block-scope. In the rest of the macro, only these variables may be used.

(3.) Can we provide an interface for the parameters of the macro?

Although we solved the problem of type-checking of parameters, the programmer cannot figure out what type the parameters "have". It is necessary to provide some kind of macro prototype! This is possible, and very safely, but we have to tolerate a little tricky syntax and some restrictions, also.

Can you figure out what the following lines do?

xxMacroPrototype(PrintData, int x; float y; char *z; int n; );
#define PrintData(X, Y, Z, N) { \
    PrintData data = { .x = (X), .y = (Y), .z = (Z), .n = (N) }; \
    printf("%d %g %s %d\n", data.x, data.y, data.z, data.n); \
  }
PrintData(1, 3.14, "Hello", 4);
  • The 1st line "defines" the prototype for the macro PrintData.
  • Below, the function-like macro PrintData is declared.
  • The 3rd line declares a temporal variable data which collects all the arguments of the macro, at once.
  • This step requires to be manually written with care by the programmer...but it is an easy syntax, and the compiler rejects (at least) the parameters assigned to temporary variables with the wrong type.
  • (However, the compiler will be silent about the "reversed" assignment .x = (N), .n = (X)).

To declare a prototype, we write xxMacroPrototype with 2 arguments:

  1. The name of the macro.
  2. The list of types and names of "local" variables that will be used inside the macro. We will call to this items: pseudoparameters of the macro.

    • The list of pseudoparameters has to be written as a list of type-variable pairs, separated (and ended) by semicolons (;).

    • In the body of the macro, the first statement will be a declaration of this form:
      MacroName foo = { .pseudoparam1 = (MacroPar1), .pseudoparam2 = (MacroPar2), ..., .pseudoparamN = (MacroParN) }

    • Inside the macro, the pseudoparameters are invoked as foo.pesudoparam1, foo.pseudoparam2, and so on.

The definition of xxMacroPrototype() is as follows:

#define xxMacroPrototype(NAME, ARGS) typedef struct { ARGS } NAME

Simple, isn't it?

  • The pseudoparameters are implemented as a typedef struct.
  • It is guaranteed that ARGS is a list of type-identifier pairs that is well constructed.
  • It is guaranteed that the compiler will give understandable diagnostics.
  • The list of pseudoparameters has the same restrictions than a struct declaration. (For example, variable-size arrays only can be at the end of the list). (In particular, it is recommended to use pointer-to instead of variable-size array declarators as pseudoparameters.)
  • It is not guaranteed that NAME is a real macro-name (but this fact is not too relevant).
    What matters is that we know that some struct-type has been defined "there", associated to the parameter-list of a macro.
  • It is not guaranteed that the list of pseudoparameters, provided by ARGS actually coincides in some way with the list of arguments of the real macro.
  • It is not guaranteed that a programmer will use this correctly inside the macro.
  • The scope of the struct-type declaration is the same as the point where the xxMacroPrototype invocation is done.
  • It is recommended practice to put together the macro prototype immediately followed by the corresponding macro definition.

However, it is easy to be disciplined with that kind of declarations, and it is easy to the programmer to respect the rules.

Can a block-macro 'return' a value?

Yes. Actually, it can retrieve as many values as you want, by simply passing arguments by reference, as scanf() does.

But you probably are thinking of something else:

(4.) 2nd case. Function-like macros

For them, we need a little different method to declare macro-prototypes, one that includes a type for the returned value. Also, we'll have to learn a (not-hard) technique that let us to keep the safety of block-macros, with a return value having the type we want.

The typechecking of arguments can be achieved as shown here:

  • How to verify a type in a C macro

In block-macros we can declare the struct variable NAME just inside the macro itself,
thus keeping it hidden to the rest of the program. For function-like macros this cannot be done (in standard C99). We have to define a variable of type NAME before any invocation of the macro. If we are ready to pay this price, then we can earn the desired "safe function-like macro", with returning values of a specific type.
We show the code, with an example, and then we comment it:

#define xxFuncMacroPrototype(RETTYPE, MACRODATA, ARGS) typedef struct { RETTYPE xxmacro__ret__; ARGS } MACRODATA

xxFuncMacroPrototype(float, xxSUM_data, int x; float y; );
xxSUM_data xxsum;
#define SUM(X, Y) ( xxsum = (xxSUM_data){ .x = (X), .y = (Y) }, \
    xxsum.xxmacro__ret__ = xxsum.x + xxsum.y, \
    xxsum.xxmacro__ret__)

printf("%g\n", SUM(1, 2.2));

The first line defines the "syntax" for function-macro prototypes.
A such prototype has 3 arguments:

  1. The type of the "return" value.
  2. The name of the "typedef struct" used to hold the pseudoparameters.
  3. The list of pseudoparameters, separated (and ended) by semicolon (;).

The "return" value is an additional field in the struct, with a fixed name: xxmacro__ret__.
This is declared, for safety, as the first element in the struct. Then the list of pseudoparameters is "pasted".

When we use this interface (if you let me call it this way), we have to follow a series of rules, in order:

  1. Write a prototype declaration giving 3 paramenters to xxFuncMacroPrototype() (the 2nd line of the example).
  2. The 2nd parameter is the name of a typedef struct that the macro itselfs builds, so you have not worry about, and just use it (in the example this type is xxSUM_data).
  3. Define a variable whose type is simply that struct-type (in the example: xxSUM_data xxsum;).
  4. Define the desired macro, with the appropriate number of arguments: #define SUM(X, Y).
  5. The body of the macro must be surrounded by parenthesis ( ), in order to obtain an EXPRESSION (thus, a "returning" value).
  6. Inside this parenthesis, we can separate a long list of operations and function calls by using comma operators (,).
  7. The first operation we need is to "pass" the arguments X, Y, of the macro SUM(X,Y), to the global variable xxsum. This is done by:

xxsum = (xxSUM_data){ .x = (X), .y = (Y) },

Observe that an object of type xxSUM_data is created in the air with the aid of compound literals provided by C99 syntax. The fields of this object are filled by reading the arguments X, Y, of the macro, just once, and surrounded by parenthesis, for safety.
Then we evaluate a list of expressions and functions, all of them separated by comma operators (,).
Finally, after the last comma, we just write xxsum.xxmacro__ret__, which is considered as the last term in the comma expression, and thus is the "returning" value of the macro.

Why all that stuff? Why a typedef struct? To use a struct is better than use individual variables, because the information is packed all in one object, and the data keep hidden to the rest of the program. We don't want to define "a lot of variables" to hold the arguments of each macro in the program. Instead, by defining systematically typedef struct associated to a macro, we have a more easy to handle such macros.

Can we avoid the "external variable" xxsum above? Since compound literals are lvalues, one can believe that this is possible.
In fact, we can define this kind of macros, as shown in:

  • How to verify a type in a C macro

But in practice, I cannot find the way to implement it in a safe way.
For example, the macro SUM(X,Y) above cannot be implemented with this method only.
(I tried to make some tricks with pointer-to-struct + compound literals, but it seems impossible).

UPDATE:

(5.) Broking my code.

The example given in Section 1 can be broken this way (as Chris Dodd showed me in his comment, below):

int x = 5;          /* x defined outside the macro */
PRINTINT_SQUARE(x);

Since inside the macro there is another object named x (this: int x = (X);, where X is the formal parameter of the macro PRINTINT_SQUARE(X)), what is actually "passed" as argument is not the "value" 5 defined outside the macro, but another one: a garbage value.
To understand it, let us unroll the two lines above after macro expansion:

int x = 5;
{ int x = (x); printf("%d", x*x); }

The variable x inside the block is initialized... to its own undetermined value!
In general, the technique developed in sections 1 to 3 for block macros can be broken in a similar way, while the struct object we use to hold the parameters is declared inside the block.

This shows that this kind of code can be broken, so it is unsafe:

Don't try to declare "local" variables "inside" the macro to hold the parameters.

  • Is there a "solution"? I answer "yes": I think that, in order to avoid this problem in the case of block macros (as developed in sections 1 to 3), we have to repeat what we did for function-like macros, that is: to declare the holding-parameters struct outside the macro, just after the xxMacroPrototype() line.

This is less ambitious, but anyway it responses the question: "How much is it possible to...?". On the other hand, now we follow the same approach for the two cases: block and function-like macros.



回答2:

While the self answered technique for a function like macro is clever, it does not provide the "generality" of the original "unsafe" macro, since it will not allow arbitrary types to be passed in. And, once the macro is resigned to only work for a specific type, then it is simpler, safer, and easier to maintain an inline function instead.

inline float sum_f (float x, float y) { return x + y; }

With C.11, you can use the new generic selection operator _Generic to define a macro that can call the appropriate inline function given the type of the arguments. The type selection expression (the first argument to _Generic) is used to determine the type, but the expression itself is not evaluated.

#define SUM(X, Y) \
    _Generic ( (X)+(Y) \
             , float : sum_f(X, Y) \
             , default : sum_i(X, Y) )


标签: c macros c99