I'm very new to C and am a bit confused as to when we need to manually add the terminating '\0' character to strings. Given this function to calculate string length (for clarity's sake):
int stringLength(char string[])
{
int i = 0;
while (string[i] != '\0') {
i++;
}
return i;
}
which calculates the string's length based on the null terminating character. So, using the following cases, what is the role of the '\0' character, if any?
Case 1:
char * stack1 = "stack";
printf("WORD %s\n", stack1);
printf("Length %d\n", stringLength(stack1));
Prints:
WORD stack
Length 5
Case 2:
char stack2[5] = "stack";
printf("WORD %s\n", stack2);
printf("Length %d\n", stringLength(stack2));
Prints:
WORD stack���
Length 8
(These results vary each time, but are never correct).
Case 3:
char stack3[6] = "stack";
printf("WORD %s\n", stack3);
printf("Length %d\n", stringLength(stack3));
Prints:
WORD stack
Length 5
Case 4:
char stack4[6] = "stack";
stack4[5] = '\0';
printf("WORD %s\n", stack4);
printf("Length %d\n", stringLength(stack4));
Prints:
WORD stack
Length 5
Case 5:
char * stack5 = malloc(sizeof(char) * 5);
if (stack5 != NULL) {
stack5[0] = 's';
stack5[1] = 't';
stack5[2] = 'a';
stack5[3] = 'c';
stack5[4] = 'k';
printf("WORD %s\n", stack5);
printf("Length %d\n", stringLength(stack5));
}
free(stack5);
Prints:
WORD stack
Length 5
Case 6:
char * stack6 = malloc(sizeof(char) * 6);
if (stack6 != NULL) {
stack6[0] = 's';
stack6[1] = 't';
stack6[2] = 'a';
stack6[3] = 'c';
stack6[4] = 'k';
stack6[5] = '\0';
printf("WORD %s\n", stack6);
printf("Length %d\n", stringLength(stack6));
}
free(stack6);
Prints:
WORD stack
Length 5
Namely, I would like to know the difference between cases 1, 2, 3, and 4 (also why the erratic behavior of case 2 and no need to specify the null-terminating character in 1 and 3. Also, how 3 and 4 both work the same?) and how 5 and 6 print out the same thing even though not enough memory is allocated in case 5 for the null-terminating character (since only 5 char slots are allocated for each letter in "slack", how does it detect a '\0' character, i.e. the 6th character?)
I'm so sorry for this absurdly long question, it's just I couldn't find a good didactic explanation on these specific instances anywhere else
In case 1, you are creating a string literal (a constant which will be on read only memory) which will have the \0
implicitly added to it.
Since \0
's position is relied upon to find the end of string, your stringLength()
function prints 5
.
In case 2, you are trying to initialise a character array of size 5 with a string of 5 characters leaving no space for the \0
delimiter. The memory adjacent to the string can be anything and might have a \0
somewhere. This \0
is considered the end of string here which explains those weird characters that you get. It seems that for the output you gave, this \0
was found only after 3 more characters which were also taken into account while calculating the string length. Since the contents of the memory change over time, the output may not always be the same.
In case 3, you are initialising a character array of size 6 with a string of size 5 leaving enough space to store the \0
which will be implicitly stored. Hence, it will work properly.
Case 4 is similar to case 3. No modification is done by
char stack4[5] = '\0';
because size of stack4
is 6 and hence its last index is 5. You are overwriting a variable with its old value itself. stack4[5]
had \0
in it even before you overwrote it.
In case 5, you have completely filled the character array with characters without leaving space for \0
. Yet when you print the string, it prints right. I think it is because the memory adjacent to the memory allocated by malloc()
merely happened to be zero which is the value of \0
. But this is undefined behavior and should not be relied upon. What really happens depends on the implementation.
It should be noted that malloc()
will not initialise the memory that it allocates unlike calloc()
.
Both
char str[2]='\0';
and
char str[2]=0;
are just the same.
But you cannot rely upon it being zero. Memory allocated dynamically could be having zero as the default value owing to the working of the operating system and for security reasons. See here and here for more about this.
If you need the default value of dynamically allocated memory to be zero, you can use calloc()
.
Case 6 has the \0
in the end and characters in the other positions. The proper string should be displayed when you print it.
The storage for a string must always leave room for the terminating null character. In some of your examples you don't do this, explicitly giving a length of 5. In those cases you will get undefined behavior.
String literals always get the null terminator automatically. Even though strlen
returns a length of 5, it is really taking 6 bytes.
Your case 5 only works because undefined sometimes means looking like it worked. You probably have a value of zero following the string in memory - but you can't rely on that.