In C and C++ (and several other languages) horizontal tabulators (ASCII code 9) in character and string constants are denoted in escaped form as '\t'
and "\t"
. However, I am regularly typing the unescaped tabulator character in string literals as for example in "A B"
(there is a TAB in betreen A
and B
), and at least clang++ does not seem to bother - the string seems to be equivalent to "A\tB"
. I like the unescaped version better since long indented multi-line strings are better readable in the source code.
Now I am asking myself whether this is generally legal in C and C++ or just supported by my compiler. How portable are unescaped tabulators in character and string constants?
Surprisingly I could not find an answer to this seemingly simple question, neither with Google nor on stackoverflow (I just found this vaguely related question).
It's completely legal to put a tab character directly into a character string or character literal. The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline.
So it's portable. But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations.
Yes, you can include a tab character in a string or character literal, at least according to C++11. The allowed characters include (with my emphasis):
(from C++11 standard, annex A.2)
and the source character set includes:
(from C++11 standard, paragraph 2.3.1)
UPDATE: I've just noticed that you're asking about two different languages. For C99, the answer is also yes. The wording is different, but basically says the same thing:
where both the source and execution character sets include
If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into
\t
internally.Same goes for writing code - you can embed literal tab characters in your strings. However, consider this:
Unless you put the cursor on the whitespace between
a
andb
and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. But with the\t
version, it is immediately shown to be a tab.When you press the TAB key you get whatever code point your system maps that key to. That code point may or may not be a tab on the system where the program runs. When you put \t in a literal the compiler replaces it with the appropriate code point for the target system. So if you want to be sure that you get a tab on the system where the program runs, use \t. That's its job.