About string length, terminating NUL, etc

2020-02-06 10:04发布

问题:

I'm currently learning C and I'm confused with differences between char array and string, as well as how they work.

Question 1:

Why is there a difference in the outcomes of source code 1 and source code 2?

Source code 1:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char c[2]="Hi";
    printf("%d\n", strlen(c));   //returns 3 (not 2!?)
    return 0;
}

Source code 2:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char c[3]="Hi";
    printf("%d\n", strlen(c));   //returns 2 (not 3!?)
    return 0;
}

Question 2:

How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?

char name[index] = "Mick";   //should index be 4 or 5?

char name[index] = {'M', 'i', 'c', 'k'};   //should index be 4 or 5?

#define name "Mick"   //what is the size? Is there a \0?

Question 3:

Does the terminating NUL ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?

Question 4:

Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?

It is quite confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).

Detailed answers are appreciated. But keeping your answer brief helps my understanding :) Thank you in advance!

回答1:

Answer 1: In code 1 you have a char array that is not a string; in code 2 you have a char array that is also a string.

Answer 2: A string is a char array in which (at least) one element has the value 0; if you leave the size part empty, the compiler will automatically fill it with the minimum possible value.

char astring[] = "foobar"; /* compiler automagically uses 7 for size */
printf("%d\n", (int)sizeof astring);

Answer 3: a char array in which one of the elements is NUL is a string; a char array where no elements are NUL is not a string.

Answer 4: an array defined to hold two elements (char c[2];) cannot hold three elements. If it is going to be a string it can only be the empty string or a string with 1 character.



回答2:

Question 1:

Why is there a difference in the outcomes of source code 1 and source code 2?

Source code 1:

#include <stdio.h>
#include <string.h>

int main()
{
    char c[2]="Hi";
    printf("%d", strlen(c));   //returns 3 (not 2!?)
    getchar();
}

Source code 2:

#include <stdio.h>
#include <string.h>

int main()
{
    char c[3]="Hi";
    printf("%d", strlen(c));   //returns 2 (not 3!?)
    getchar();
}

answer: Because in the first case, c[] is only holding "Hi". strlen looks for a zero at the end, and, depending on exactly what is behind c[] finds one sooner or later, or crashes. We can't say without knowing exactly what is in the memory behind the c[] array.

Question 2:

How is a string variable different from a char array? How to declare them with the minimum required index numbers allowing \0 to be stored if any (please read the codes below)?

char name[index] = "Mick";   //should index be 4 or 5?

char name[index] = {'M', 'i', 'c', 'k'};   //should index be 4 or 5?

answer Really depends on what you want to do. Probably 5 if you want to actually use the content as a string. But there's nothing saying you can't store "Mick" in a 4 character array - you just can't use strlen to find out how long it is, because strlen will continue to 5 and quite possibly (much) further to find the length, and if there is no zero in the next several memory locations, it could lead to a crash, because eventually, there won't be valid memory addresses to read.

#define name "Mick" //what is the size? Is there a \0?

This has absolutely no size at all, until you use name somwhere. #defines are not part of what the compiler sees - the pre-processor will replace name with "Mick" if you use name anywhere - and hopefully, that's in a place the compiler can make sense of. And then the same rules apply as in previous answer - it depends on how you want to use the array of characters. For correct operation with strlen, strpy, and nearly all other str... functions, you need a zero at the end.

Question 3:

Does the terminating null ONLY follow strings but not char arrays? So the actual value of the string "Hi" is [H][i][\0] and the actual value of the char array "Hi" is [H][i]?

Yes, no, maybe. It all depends on how you USE the "Hi" string literal (that's the technical name for 'something within double quotes'). If the compiler is "allowed", it will put a zero at the end. But if you initialize an array to a given size, it will stuff the bytes in there, and if there isn't room for a zero, that's your problem, not the compiler's.

Question 4:

Suppose c[2] is going to store "Hi" followed by a \0 (not sure how this is done, using gets(c) maybe?). So where is the \0 stored? Is it stored "somewhere" after c[2] to become [H][i]\0 or will c[2] be appended with a \0 to become c[3] which is [H][i][\0]?

In c[2], beyond the 'H', 'i', there is no telling what is stored [technically, it could well be "the end of the earth" - in computer terms, that's "memory that can't be read - in which case strlen on that WILL crash your program, because strlen reads beyond the end of the earth]. But if could also be a zero, a one, the letter 'a', the number 42, or any other 8-bit [1] value.

It is quiet confusing that sometimes there is a \0 following the string/char array and causes trouble when I compare two variables by if (c1==c2) as it most likely returns FALSE (0).

If c1 and c2 are char arrays, that will ALWAYS be false since c1 and c2 are never going to have the same address, and when using an array in C in that way, it becomes "the address in memory of the first element in the array". So no matter what teh contents of c1 and c2 is, their address can never be the same [because they are two different variables, and two variables can not have the same location in memory - that's like trying to park two cars in a parking space large enough only for one car - and no, crushing either car is not allowed in our thought experiment].

[1] Char isn't guaranteed to be 8 bits. But lets inore that for now.



回答3:

Running source code one is undefined behavior because strlen() requires a NUL-terminated string, which c[2] = "Hi"; /* = { 'H', 'i' } */ is not. A string differs from a char array in that a string is a char array with at least one NUL byte somewhere in the array.

The remaining answers should follow easily from this fact.

To autosize a char array to match the size of a string literal at initialization, simply specify no array size:

char c[] = "This will automatically size the c array (including the NUL).";

Note that you cannot compare char arrays with the == operator. You have to use

if (strcmp(c1, c2) == 0) {
   /* Equal. */
} else {
   /* Not equal. */
}


回答4:

strlen() works on \0 terminating characters and in C all strings should be \0 terminated. So when you have given only 2 spaces for 2 characters H and i but there is no room for \0. Hence you are getting Undefined Behavior in strlen(). In case of char c[3] = "Hi"; there is \0 at the third place and strlen() will calculate the actual length.

How to declare them with the minimum required index numbers allowing \0 to be stored if any ?

When you are not sure about the size of char array , Do like this :

char c1[] = "Mike"; // strlen = 4 
char c2[] = "Omkant" // strlen = 6

NOTE :

EDIT :In the above case where no size is mentioned explicitly , Do not confuse with sizeof with the strlen().

strlen() returns only number of charaters sizeof gives number of characters plus one more (for \0 character).

So sizeof always gives exactly 1 more than the number returned by strlen().



标签: c string arrays