Unknown meta-character in C/C++ string literal?

2019-06-20 01:01发布

I created a new project with the following code segment:

char* strange = "(Strange??)";
cout << strange << endl;

resulting in the following output:

(Strange]

Thus translating '??)' -> ']'

Debugging it shows that my char* string literal is actually that value and it's not a stream translation. This is obviously not a meta-character sequence I've ever seen. Some sort of Unicode or wide char sequence perhaps? I don't think so however... I've tried disabling all related project settings to no avail.

Anyone have an explanation?

  • search : 'question mark, question mark, close brace' c c++ string literal

9条回答
干净又极端
2楼-- · 2019-06-20 01:36

Trigraphs are the reason. The talk about C in the article also applies to C++

查看更多
何必那么认真
3楼-- · 2019-06-20 01:42

While trying to cross-compile on GCC it picked my sequence up as a trigraph:

So all I need to do now is figure out how to disable this in projects by default since I can only see it creating problems for me. (I'm using a US keyboard layout anyway)

The default behavior on GCC is to ignore but give a warning, which is much more sane and is indeed what Visual Studio 2010 will adopt as the standard as far as I know.

查看更多
爷、活的狠高调
4楼-- · 2019-06-20 01:46

Easy way to avoid the trigraph surprise: split a "??" string literal in two:

char* strange = "(Strange??)";
char* strange2 = "(Strange?" "?)";
/*                         ^^^ no punctuation */

Edit
gcc has an option to warn about trigraphs: -Wtrigraphs (enabled with -Wall also)
end edit

Quotes from the Standard

    5.2.1.1 Trigraph sequences
1   Before any other processing takes place, each occurrence of one of the
    following sequences of three characters (called trigraph sequences13))
    is replaced with the corresponding single character.
           ??=      #               ??)      ]               ??!      |
           ??(      [               ??'      ^               ??>      }
           ??/      \               ??<      {               ??-      ~
    No other trigraph sequences exist. Each ? that does not begin one of
    the trigraphs listed above is not changed.
    5.1.1.2 Translation phases
1   The precedence among the syntax rules of translation is specified by
    the following phases.
         1.   Physical source file multibyte characters are mapped, in an
              implementation-defined manner, to the source character set
              (introducing new-line characters for end-of-line indicators)
              if necessary. Trigraph sequences are replaced by corresponding
              single-character internal representations.
查看更多
太酷不给撩
5楼-- · 2019-06-20 01:47

That's trigraph support. You can prevent trigraph interpretation by escaping any of the characters:

char* strange = "(Strange?\?)";
查看更多
Explosion°爆炸
6楼-- · 2019-06-20 01:49
forever°为你锁心
7楼-- · 2019-06-20 01:54

What you're seeing is called a trigraph.

In written language by grown-ups, one question mark is sufficient for any situation. Don't use more than one at a time and you'll never see this again.

GCC ignores trigraphs by default because hardly anyone uses them intentionally. Enable them with the -trigraph option, or tell the compiler to warning you about them with the -Wtrigraphs option.

Visual C++ 2010 also disables them by default and offers /Zc:trigraphs to enable them. I can't find anything about ways to enable or disable them in prior versions.

查看更多
登录 后发表回答