I am brushing up on my C++ and stumbled across a curious behavior in regards to strings, character arrays, and the null character ('\0'
). The following code:
#include <iostream>
using namespace std;
int main() {
cout << "hello\0there"[6] << endl;
char word [] = "hello\0there";
cout << word[6] << endl;
string word2 = "hello\0there";
cout << word2[6] << endl;
return 0;
}
produces the output:
> t
> t
>
What is going on behind the scenes? Why does the string literal and the declared char array store the 't'
at index 6 (after the internal '\0'
), but the declared string does not?
The problem is that you are not printing strings at all - you are printing single characters.
So, you are invoking the "char" overloads, not the "char*" or "string" overloads at all, and the NULL chars have nothing to do with it at all : You are just printing the 6th character of word, and the 6th character of word2.
If I am reading your intent correctly, your test should read:
In C++11 and later this will also print "there" And be well defined
You are constructing a string from a
char*
(or something that decayed to that). This means that the convention for C-strings apply. That is they are'\0'
terminated. That's whyword2
only contains"hello"
.From what I remember, the first two are in essence just an array and the way a string is printed is to continue to print until a
\0
is encounterd. Thus in the first two examples you start at the point offset of the 6th character in the string, but in your case you are printing out the 6th character which ist
.What happens with the
string
class is that it makes a copy of the string into it's own internal buffer and does so by copying the string from the start of the array up to the first\0
it finds. Thus thet
is not stored because it comes after the first\0
.Because the
std::string
constructor that takes aconst char*
treats its argument as a C-style string. It simply copies from it until it hits a null-terminator, then stops copying.So your last example is actually invoking undefined behaviour;
word2[6]
goes past the end of the string.