Valid preprocessor tokens in macro concatenation

2019-04-22 12:07发布

I tried to understand the macros in c using the concatenation preprocessor operator ## but I realized that I have problem with tokens. I thought it was easy but in practice it is not.

So the concatenation is for concatenating two tokens to create a new token. ex: concatenating ( and ) or int and *

I tried

#define foo(x,y) x ## y
foo(x,y)

whenever I give it some arguments I get always error saying that pasting both argument does not give a valid preprocessor token.

For instance why concatenating foo(1,aa) results in 1aa (which type of token is it ? and why it is valid) but foo(int,*) I got an error.

Is there a way to know which tokens are valid or is it possible to have some good link to understand how can clarify it in my mind. (I already googled in google and SO)

What am I missing ?

I will be grateful.

3条回答
Root(大扎)
2楼-- · 2019-04-22 12:28

Preprocessing token is defined by the C language grammar, see section 6.4 of the current standard:

preprocessing-token:
                   header-name
                   identifier
                   pp-number
                   character-constant
                   string-literal
                   punctuator
                   each non-white-space character that cannot be one of the above

The meaning of each of those terms is defined elsewhere in the grammar. Most are self-explanatory; identifier means anything that is a valid variable name (or would be if it wasn't a keyword), and pp-number includes integer and floating point constants.

In Standard C, the result of pasting two preprocessing tokens must be another valid preprocessing token. Historically some preprocessors have allowed other pasting (which is equivalent to not pasting!) but this leads to confusion when people compile their code with a different compiler.

查看更多
Explosion°爆炸
3楼-- · 2019-04-22 12:39

Since it seems to be a point of confusion, the string 1aa is a valid preprocessor token; it is an instance of pp-number, whose definition is (§6.4.8 of the current C standard):

     pp-number:
            digit
            . digit
            pp-number       digit
            pp-number       identifier-nondigit
            pp-number       e sign
            pp-number       E sign
            pp-number       p sign
            pp-number       P sign
            pp-number       .

In other words, a pp-number starts with a digit or a . followed by a digit, and after that it can contain any sequence of digits, "identifier-nondigits" (that is, letters, underscores, and other things which can be part of an identifier) or the letters e or p (either upper or lower-case) followed by a plus or minus sign.

That means that, for example, 0x1e+2 is a valid pp-number, while 0x1f+1 is not (it is three tokens). In a valid program, every pp-number which survives the preprocessing phases must satisfy the syntax of some numeric constant representation, which means that a program which includes the text 0x1e+2 will be considered invalid. The moral, if there is one, is that you should use whitespace generously; it has no cost.

The intention of pp-number is to include everything which might eventually be a number in some future version of C. (Remember that numbers can be followed by alphabetic suffixes indicating types and signedness, such as 27LU).

However, int* is not a valid preprocessor token. It is two tokens (as is -3) and so it cannot be formed with the token concatenation operator.

Another odd consequence of the token-pasting rule is that it is impossible to generate the valid token ... through token concatenation, because .. is not a valid token. (a##b##c must be evaluated in some order, so even if all three preprocessor macros expand to ., there must be an attempt to create the token .., which will fail in must compilers, although I believe Visual Studio accepts it.)

Finally, comment symbols /* and // are not tokens; comments are replaced with whitespace before the separation of the program text into tokens. So you cannot produce a comment with token-pasting either (at least, not in a compliant compiler).

查看更多
Bombasti
4楼-- · 2019-04-22 12:50

Preprocessor token concatenation is for generating new tokens, but it is not capable of pasting arbitrary language constructs together (confer, for example, gcc documentation):

However, two tokens that don't together form a valid token cannot be pasted together. For example, you cannot concatenate x with + in either order.

So an attempt at a macro that makes a pointer out of a type like

#define MAKEPTR(NAME)  NAME ## *
MAKEPTR(int) myIntPtr;

is invalid, as int* are two tokens, not one.

The example of above mentioned link, however, shows the generation of new tokens:

 #define COMMAND(NAME)  { #NAME, NAME ## _command }

 struct command commands[] =
 {
   COMMAND (quit),
   COMMAND (help),
   ...
 };

yields:

 struct command commands[] =
 {
   { "quit", quit_command },
   { "help", help_command },
   ...
 };

Token quit_command has not existed before but has been generated through token concatenation.

Note that a macro of the form

#define MAKEPTR(TYPE)  TYPE*
MAKEPTR(int) myIntPtr;

is valid and actually generates a pointer type out of TYPE, e.g. int* out of int.

查看更多
登录 后发表回答