This question already has an answer here:
- How to declare strings in C [duplicate] 4 answers
A few weeks ago I started learning the programming language C. I have knowledge in web technologies like HMTL/CSS, Javscript, PHP, and basic server administration, but C is confusing me. To my understanding, the C language does not have a data type for strings, just characters, however I may be wrong.
I have heard there are two ways of declaring a string. What is the difference between these two lines of declaring a string:
a.) char stringName[];
b.) char *stringName;
I get that char stringName[];
is an array of characters. However, the second line confuses me. To my understanding the second line makes a pointer variable. Aren't pointer variables supposed to be the memory address of another variable?
In the C language, a "string" is, as you say, an array of
char
. Most string functions built into the C spec expect the string to be "NUL terminated", meaning the lastchar
of the string is a0
. Not the code representing the numeral zero, but the actual value of0
.For example, if you're platform uses ASCII, then the following "string" is "ABC":
When you use the
char varName[] = "foo"
syntax, you're allocating the string on the stack (or if its in a global space, you're allocating it globally, but not dynamically.)Memory management in C is more manual than in many other langauges you may have experience with. In particular, there is the concept of a "pointer".
Now, a
char *
is "an address that points to a char or char array". Notice the "or" in that statement, it is important for you, the programmer, to know what the case is.It's important to also ensure that any string operations you perform don't exceed the amount of memory you've allocated to a pointer.
"12345" is actually 6 characters long (don't forget the
0
at the end), but I've only reserved 5 characters. This is what's called a "buffer overflow", and is the cause of many serious bugs.The other difference between "[]" and "*", is that one is creating an array (as you guessed). The other one is not reserving any space (other than the space to hold the pointer itself.) That means that until you point it somewhere that you know is valid, the value of the pointer should not be used, for either reading or writing.
Another point (made by someone in the comment)
You cannot pass an array as a parameter to a function in C. When you try, it gets converted to a pointer automatically. This is why we pass around pointers to strings rather than the strings themselves
In C, a string is a sequence of character values followed by a 0-valued byte1 . All the library functions that deal with strings use the 0 terminator to identify the end of the string. Strings are stored as arrays of
char
, but not all arrays ofchar
contain strings.For example, the string
"hello"
is represented as the character sequence{'h', 'e', 'l', 'l', 'o', 0}
2 To store the string, you need a 6-element array ofchar
- 5 characters plus the 0 terminator:or
In the second case, the size of the array is computed from the size of the string used to initialize it (counting the 0 terminator). In both cases, you're creating a 6-element array of
char
and copying the contents of the string literal to it. Unless the array is declared at file scope (oustide of any function) or with thestatic
keyword, it only exists for the duration of the block in which is was declared.The string literal
"hello"
is also stored in a 6-element array ofchar
, but it's stored in such a way that it is allocated when the program is loaded into memory and held until the program terminates3, and is visible throughout the program. When you writeyou are assigning the address of the first element of the array that contains the string literal to the pointer variable
greeting
.As always, a picture is worth a thousand words. Here's a simple little program:
And here's the output:
Note the difference between
sizeof
andstrlen
-strlen
counts all the characters up to (but not including) the 0 terminator.So here's what things look like in memory:
The string literal
"hello"
is stored at a vary low address (on my system, this corresponds to the.rodata
section of the executable, which is for static, constant data). The variablesgreeting
andgreetingPtr
are stored at much higher addresses, corresponding to the stack on my system. As you can see,greetingPtr
stores the address of the string literal"hello"
, whilegreeting
stores a copy of the string contents.Here's where things can get kind of confusing. Let's look at the following print statements:
greeting
is a 6-element array ofchar
, andgreetingPtr
is a pointer tochar
, yet we're passing them both toprintf
in exactly the same way, and the string is being printed out correctly; how can that work?Unless it is the operand of the
sizeof
or unary&
operators, or is a string literal used to initialize another array in a declaration, an expression of type "N-element array ofT
" will be converted ("decay") to an expression of type "pointer toT
", and the value of the expression will be the address of the first element of the array.In the
printf
call, the expressiongreeting
has type "6-element array ofchar
"; since it isn't the operand of thesizeof
or unary&
operators, it is converted ("decays") to an expression of type "pointer tochar
" (char *
), and the address of the first element is actually passed toprintf
. IOW, it behaves exactly like thegreetingPtr
expression in the nextprintf
call4.The
%s
conversion specifer tellsprintf
that its corresponding argument has typechar *
, and that it it should print out the character values starting from that address until it sees the 0 terminator.Hope that helps a bit.
1. Often referred to as the
NUL
terminator; this should not be confused with theNULL
pointer constant, which is also 0-valued but used in a different context.2. You'll also see the terminating 0-valued byte written as
'\0'
. The leading backslash "escapes" the value, so instead of being treated as the character'0'
(ASCII 48), it's treated as the value0
(ASCII 0)).3. In practice, space is set aside for it in the generated binary file, often in a section marked read-only; attempting to modify the contents of a string literal invokes undefined behavior.
4. This is also why the declaration of
greeting
copies the string contents to the array, while the declaration ofgreetingPtr
copies the address of the first element of the string. The string literal"hello"
is also an array expression. In the first declaration, since it's being used to initialize another array in a declaration, the contents of the array are copied. In the second declaration, the target is a pointer, not an array, so the expression is converted from an array type to a pointer type, and the resulting pointer value is copied to the variable.In C (and in C++), arrays and pointers are represented similarly; an array is represented by the address of the first element in the array (which is sufficient to gain access to the other elements, since elements are contiguous in memory within an array). This also means that an array does not, by itself, indicate where it ends, and thus you need some way of identifying the end of the array, either by passing around the length as a separate variable or by using some convention (such as that there is a sentinel value that is placed in the last position of the array to indicate the end of the array). For strings, the latter is the common convention, with '\0' (the NUL character) indicating the end of the string.