Is it mandatory to escape tabulator characters in

2020-08-09 07:28发布

问题:

In C and C++ (and several other languages) horizontal tabulators (ASCII code 9) in character and string constants are denoted in escaped form as '\t' and "\t". However, I am regularly typing the unescaped tabulator character in string literals as for example in "A B" (there is a TAB in betreen A and B), and at least clang++ does not seem to bother - the string seems to be equivalent to "A\tB". I like the unescaped version better since long indented multi-line strings are better readable in the source code.

Now I am asking myself whether this is generally legal in C and C++ or just supported by my compiler. How portable are unescaped tabulators in character and string constants?

Surprisingly I could not find an answer to this seemingly simple question, neither with Google nor on stackoverflow (I just found this vaguely related question).

回答1:

Yes, you can include a tab character in a string or character literal, at least according to C++11. The allowed characters include (with my emphasis):

any member of the source character set except the double-quote ", backslash \, or new-line character

(from C++11 standard, annex A.2)

and the source character set includes:

the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

(from C++11 standard, paragraph 2.3.1)

UPDATE: I've just noticed that you're asking about two different languages. For C99, the answer is also yes. The wording is different, but basically says the same thing:

In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or [...]

where both the source and execution character sets include

control characters representing horizontal tab, vertical tab, and form feed.



回答2:

It's completely legal to put a tab character directly into a character string or character literal. The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline.

So it's portable. But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations.



回答3:

If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into \t internally.

Same goes for writing code - you can embed literal tab characters in your strings. However, consider this:

     T     T     T        <--tab stops
012345012345012345012345
foo1 = 'a\tb';
foo2 = 'a  b'; // pressed tab in the editor
foo3 = 'a  b'; // hit space twice in the editor

Unless you put the cursor on the whitespace between a and b and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. But with the \t version, it is immediately shown to be a tab.



回答4:

When you press the TAB key you get whatever code point your system maps that key to. That code point may or may not be a tab on the system where the program runs. When you put \t in a literal the compiler replaces it with the appropriate code point for the target system. So if you want to be sure that you get a tab on the system where the program runs, use \t. That's its job.



标签: c++ c