Why is “\\?” an escape sequence in C/C++?

2019-01-11 09:30发布

问题:

There are four special non-alphabet characters that need to be escaped in C/C++: the single quote \', the double quote \", the backslash \\ and the question mark \?. It's apparently because they have special meanings. ' for single char, " for string literals, \ for escape sequences, but why is ? one of them?

I read the table of escape sequences in a textbook today and I realized that I've never escape ? before and never encountered a problem with it , just to be sure, I tested it under gcc:

#include <stdio.h>
int main(void)
{
    printf("question mark ? and escaped \?\n");
    return 0;
}

and the C++ version:

#include <iostream>
int main(void)
{
    std::cout << "question mark ? and escaped \?" << std::endl;
    return 0;
}

Both programs output: question mark ? and escaped ?

So I have two questions:

  1. Why is \? one of the escape sequence characters ?
  2. Why non-escaping ? works fine, there's not even a warning.

Before I'm about to ask this question, I found the answer myself, since I didn't find a duplicate in SO, I decided to post it in Q&A style.

The more interesting fact is that the escaped \? can be used the same as ? in some other languages as well, I tested in Lua/Ruby, it's also true even though I didn't find this documented.

回答1:

Why is \? one of the escape sequence characters ?

Because it is special, the answer leads to Trigraph, the C/C++ preprocessor replaces following three-character sequence to the corresponding single character. (C11 §5.2.1.1 and C++11 §2.3)

Trigraph:       ??(  ??)  ??<  ??>  ??=  ??/  ??'  ??!  ??-
Replacement:      [    ]    {    }    #    \    ^    |    ~

Trigraph is nearly useless now, mainly used for obfuscated purpose, some examples can be seen in IOCCC.

gcc doesn't support trigraph by default, and will warn you if there's trigraph in the code, unless the option -trigraphs3 is enabled. Under -trigraphs option, the second \? is useful in the following example:

printf("\?\?!\n");  

Output would be | if ? is not escaped.

For more information on trigraph, see Cryptic line "??!??!" in legacy code


Why non-escaping ? works fine, there's not even a warning.

Because ?(and double quote ") can be represented by themselves by the standard:

C11 §6.4.4.4 Character constants Section 4

The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.

Similar in C++:

C++11 §2.13.2 Character literals Section 3

Certain nongraphic characters, the single quote , the double quote ", the question mark ?, and the backslash \, can be represented according to Table 6. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote and the backslash \ shall be represented by the escape sequences \’ and \\ respectively. If the character following a backslash is not one of those specified, the behavior is undefined. An escape sequence specifies a single character.