Are list-initialized char arrays still null-termin

2019-04-30 05:41发布

问题:

As I worked through the Lippman C++ Primer (5th ed, C++11), I came across this code:

char ca[] = {'C', '+', '+'};  //not null terminated
cout << strlen(ca) << endl;  //disaster: ca isn't null terminated

Calling the library strlen function on ca, which is not null-terminated, results in undefined behavior. Lippman et al say that "the most likely effect of this call is that strlen will keep looking through the memory that follows ca until it encounters a null character."

A later exercise asks what the following code does:

const char ca[] = {'h','e','l','l','o'};
const char *cp = ca;
while (*cp) {
   cout << *cp << endl;
   ++cp;
}

My analysis: ca is a char array that is not null-terminated. cp, a pointer to char, initially holds the address of ca[0]. The condition of the while loop dereferences pointer cp, contextually converts the resulting char value to bool, and executes the loop block only if the conversion results in 'true.' Since any non-null char converts to a bool value of 'true,' the loop block executes, incrementing the pointer by the size of a char. The loop then steps through memory, printing each char until a null character is reached. Since ca is not null-terminated, the loop may continue well past the address of ca[4], interpreting the contents of later memory addresses as chars and writing their values to cout, until it happens to come across a chunk of bits that happen to represent the null character (all 0's). This behavior would be similar to what Lippman et al suggested that strlen(ca) does in the earlier example.

However, when I actually execute the code (again compiling with g++ -std=c++11), the program consistently prints:

'h'
'e'
'l'
'l'
'o'

and terminates. Why?

回答1:

Most likely explanation: On modern desktop/server operating systems like windows and linux, memory is zeroed out before it is mapped into the address space of a program. So as long as the program doesn't use the adjacent memory locations for something else, it will look like a null terminated string. In your case, the adjacent bytes are probably just padding, as most variables are at least 4-Byte aligned.

As far as the language is concerned this is just one possible realization of undefined behavior.



回答2:

Are list-initialized char arrays still null-terminated?

There is no implicit null-terminator.

A list-initialized char array contains a null-terminated string, if at least one of the characters is initialized with the null-terminator.

If none of the characters are the null-terminator, then the array does not contain a null-terminated string.

the program consistently prints ... and terminates. Why?

You analyzed that the array would be accessed out of bounds. Your analysis is correct. You should also know that accessing an array out of bounds has undefined behaviour. So, the answer to why does it behave like this is: Because the behaviour is undefined.

As I already mentioned, your analysis is correct. Only your (implied) assumption that when the memory is accessed out of bounds, the first value must be a non-zero value. That assumption is wrong, because it is not guaranteed.