undefined C/C++ symbol as operator

2019-07-05 02:54发布

问题:

I notice that the character/symbol '`' and '@' is not used as an operator in C/C++,

  1. does anyone know the reason or historically why its so?
  2. if its really not used, is it safe to define those symbols as another operator/statement using #define?

回答1:

Normally, #define only accepts valid identifiers in the macro name - so you cannot do:

#define @      at
#define @(x)   [x]

Similarly with back-quote. And you didn't mention '$', which is sometimes allowed in identifiers.

There might be a compiler-specific extension to allow such mappings, but I wouldn't use it.


As to the historical reason for this, there are parts of the ISO 646 character set that are reserved to national implementations for national characters. These reserved portions include the characters that cause trouble, and the trigraphs and digraphs features in Standard C (and hence Standard C++) were added to ISO C in 1989 and 1994 respectively to provide workarounds for the problems.

Trigraphs

Trigraphs were added during the C89 standardization process to prevent people from, for example, having to see alphabetic characters (in Scandinavian languages) used in their C code (adapted from an example in B Stroustrup, 'Design and Evolution of C++', using a Danish terminal):

#include <stdio.h>
int main(int argc, char **argvÆÅ)
æ
    if (argc < 1 øø *argvÆ1Å == 'Ø0') return 0;
    printf("Hello, %sØn", argvÆ1Å);
å

Or, in the ISO 8859-1 code set (or any of the ISO 8859-x code sets):

#include <stdio.h>
int main(int argc, char **argv[])
{
     if (argc < 1 || argv[1] == '\0') return 0;
     printf("Hello, %s\n", argv[1]);
}

The trigraphs were introduced to produce a neutral format for the code:

??=include <stdio.h>
int main(int argc, char **argv??(??))
??<
    if (argc < 1 ??!??! *argv??(1??) == '??/0') return 0;
    printf("Hello, %s??/n", argv??(1??));
??>

That's not very readable, either, but it is the same for everyone.

Trigraph      Equivalent to
??/           \      backslash
??<           {      open brace
??>           }      close brace
??(           [      open square bracket
??)           ]      close square bracket
??=           #      hash (pound in American, but a pound is £ in English)
??'           ^      caret
??!           |      pipe
??-           ~      tilde

The standard says 'there are no other trigraphs'. This is why the escape sequence '\?' is recognized (as a simple question mark - though presumably that is '??/?'). Note that the GNU Compiler Collection (GCC) does not interpret trigraphs unless you hold its hand to the fire (specify '-trigraphs' on the command line).

Digraphs

The digraphs were added in 1994, and are not as pervasive or intrusive as trigraphs; they only appear outside strings and string literals. The digraphs are:

Digraph       Equivalent to
<:            [
:>            ]
<%            {
%>            }
%:            #
%:%:          ##

The example using digraphs (and trigraphs):

%:include <stdio.h>
%:include <iso646.h>
int main(int argc, char **argv<::>)
<%
    if (argc < 1 or *argv<:1:> == '??/0') return 0;
    printf("Hello, %s??/n", argv<:1:>);
%>

At sign and back quote specifically?

If you look at the Wikipedia URL above, you'll see that both '@' and '`' are sometimes replaced by national characters - and hence not good identifiers. An additional reason for not using '@' is that at the time C was introduced, '#" was the default erase character and '@' was the kill (line erase) character for terminals. So, you had to remember to escape them. Since '#' only appeared at the beginning of a line, it wasn't too much of a problem (using '#' and '##' came much, much later - standardization again), but '@' would have wiped out all the preceding typing on the line. And this is the days before 'vi' - 'ed is the the standard Unix editor'.



回答2:

It is probably safe, but it's almost definitely a really bad idea. Since @ is not a standard operator, anyone else else reading your code will have to go track down the definition of @ wherever you use it. We name functions and don't just use symbols so that humans reading the code can figure out what it does.

On a side note, Objective-C uses @. Not sure if that's relevant to your project, but if anyone tried to use your C code from ObjC, all their code would break because of your #define.



回答3:

With regard to C:

A #defined macro has a name that is a C identifier (§6.10).

An identifier may consist of _a-zA-Z0-9 (§6.4.2.1). Anything else is implementation-defined. If you use @ in a macro name, it may work on some compilers (though I would be surprised), but it will not be portable.

I don't know what the situation is with C++.