In C++11 what should happen first: raw string expa

2019-04-17 18:58发布

问题:

This code works in Visual C++ 2013 but not in gcc/clang:

#if 0
R"foo(
#else
int dostuff () { return 23; }
// )foo";
#endif
dostuff();

Visual C++ removes the if 0 first. Clang expands the R raw string first (and never defining dostuff). Who is right and why?

回答1:

[Update: Adrian McCarthy comments below saying MSVC++ 2017 fixes this]

GCC and clang are right, VC++ is wrong.

2.2 Phases of translation [lex.phases]:

[...]

  1. The source file is decomposed into preprocessing tokens (2.5) and sequences of white-space characters (including comments).

  2. Preprocessing directives are executed, [...]

And 2.5 Preprocessing tokens [lex.pptoken] lists string-literals amongst the tokens.

Consequently, parsing is required to tokenise the string literal first, "consuming" the #else and dostuff function definition.



回答2:

I thought it was worth reiterating the interesting "quirk" of the lexing phase. The contents inside a #if 0 ... #else are not ignored like you might naively imagine (I was naive until I tested it). Here are two examples, the difference is simply an extra space between the R and the " in the raw string declaration which is inside the #if 0 block.

#include <iostream>
using namespace std;

#if 0 
const char* s = R"(
#else
int foo() { return 3; }
// )";
#endif

int main() {
    std::cout << foo() << std::endl;
    return 0;
}

Results in (gcc 6.3, C++14)

prog.cpp: In function ‘int main()’:
prog.cpp:12:19: error: ‘foo’ was not declared in this scope
  std::cout << foo() << std::endl;

While adding a space character (in the code that is supposedly ignored by the compiler!) lets it compile:

#include <iostream>
using namespace std;

#if 0 
const char* s = R "(
#else
int foo() { return 3; }
// )";
#endif

int main() {
    std::cout << foo() << std::endl;
    return 0;
}

Compiles and runs with

3

Note that using a traditional non-raw string literal does not have this problem. You are not allowed to split a non-raw string across a newline, so in this case, the non-raw string is ignored and not tokenized. So if you got rid of the R, it compiles just file.

Obviously, the safe thing to do is not let your raw-string cross a preprocessor boundary.